Re: [R] Scatter plot for repeated measures

2014-12-06 Thread arun


Not sure whether it is a scatterplot or just a plot with 3 lines.  If it is the 
latter,

library(reshape2)

matplot(acast(my.df, TIME~ID, value.var='X'), type='l', col=1:3, ylab='X', 
xlab='TIME')
legend('bottomright', inset=.05, legend=LETTERS[1:3], pch=1, col=1:3)
A.K.

On Friday, December 5, 2014 5:45 PM, farnoosh sheikhi  
wrote:



Hi Arun,

I hope you are doing well.
I have a data set as follow:
my.df <- data.frame(ID=rep(c("A","B","C"), 5), TIME=rep(1:5, each=3), X=1:5)

I would like to get a scatterplot where x axis is Time (1,2,3,4,5) and y axis 
is X, but I want to have three lines separately for each ID.
 I basically want to tack each ID over time. Is this possible?


Thanks a lot and Happy Holidays to you!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mean calculation

2015-01-26 Thread arun
Hi Juvin,

 The error "dim(X) must have a positive length" usually shows when you are 
passing a vector to "apply", ie.

apply(1:5,2,mean)
#Error in apply(1:5, 2, mean) : dim(X) must have a positive length



   Also, if your dataset originally has "1206" columns, it is not clear why you 
needed the below code.  ("rainfall" is already a "data.frame")


  precip=data.frame(rainfall[1:1206]) 



Based on the data provided,

rainfall <-  read.table(text="123456789
1011 
NA00001200000 
NA0000000000 
NA00001400005 
NA0000000000 
NA00270000200165 
NA0883800000026 
NA121200000002 
NA2000000000 
NA2000000000 
NA024100003062 
NA260000000033",sep="", header=TRUE, 
check.names=FALSE) 



apply(rainfall, 2, function(x) c(mean=mean(x, na.rm=TRUE), 

   median=median(x, na.rm=TRUE), max=max(x, na.rm=TRUE)))

#1 23  4 5 6 7 8 9 1011
#meanNaN  3.818182 11.27273  6 0  2.363636 0 0  2.090909  0  26.63636
#median   NA  0.00  0.0  0 0  0.00 0 0  0.00  0   2.0
#max-Inf 26.00 88.0 38 0 14.00 0 0 20.00  0 165.0



Or using `colMaxs`, `colMedians` from `matrixStats`

library(matrixStats)
rbind(mean=colMeans(rainfall, na.rm=TRUE), median= 
colMedians(as.matrix(rainfall),
  na.rm=TRUE), max=colMaxs(rainfall, na.rm=TRUE))

Another option would be to use `summarise_each` from `dplyr`

library(dplyr)
rainfall %>%
 summarise_each(funs(mean=mean(., na.rm=TRUE), median=median(., 
na.rm=TRUE), 

   max=max(., na.rm=TRUE)))

A.K.


I tried to calculate a mean from a csv table by forming a data frame, 
but it says dim(x)must have a positive length. The table has 1206 column and 31 
rows. I want to calculate mean, median, and maximum from the the table. The 
table has some NA values which i dont want to include. The 
table looks as follows: 
1234567891011 
NA00001200000 
NA0000000000 
NA00001400005 
NA0000000000 
NA00270000200165 
NA0883800000026 
NA121200000002 
NA2000000000 
NA2000000000 
NA024100003062 
NA260000000033 

I used following code to calculate mean: 
Any help would be appreciated. 
rainfall=read.table('bmark.csv',header=T,sep=',') 
precip=data.frame(rainfall[1:1206]) 
monthlyMean=apply(precip, MARGIN=2,FUN=mean,na.rm=TRUE) 

Juvin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a way to map data from Binary format to Numerical numbers?

2015-02-01 Thread arun
Try

indx <- which(!!mat, arr.ind=TRUE)
v1 <-unname(sapply(split(indx[,2], indx[,1]),toString))

cat(paste(v1, collapse="\n"), sep="\n")
1, 2, 3, 6, 7, 8, 9
1, 2, 3, 6, 8, 9
1, 3, 4, 6, 7, 8, 9
1, 8
1, 3, 6, 7, 8, 9
1, 3, 4, 6, 8, 9
1, 3, 5, 9


A.K.

   



Hi,
Is there a way to map data from Binary format to Numerical numbers?

example:
I have text files, where each record consists of several items (9 items)
1, means item appear
0, means item absent

1,1,1,0,0,1,1,1,1
1,1,1,0,0,1,0,1,1
1,0,1,1,0,1,1,1,1
1,0,0,0,0,0,0,1,0
1,0,1,0,0,1,1,1,1
1,0,1,1,0,1,0,1,1
1,0,1,0,1,0,0,0,1


I want transform my data to numerical numbers in ascending order, such that 
when items is absent, i didn't print it, but keep increase the counter. for 
example, the above binary format will be: ,
1,2,3,6,7,8,9
1,2,3,6,8,9
1,3,4,6,7,8,9
1,8,
1,3,6,7,8,9
1,3,4,5,7,8
1,3,5,9

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference in dates for unique ID

2015-02-15 Thread arun
HI Farnoosh,



Not sure I understand the expected output.  The difference between the first 2 
days is "136 days"

May be this helps

   library(data.table)
   dcast.data.table(setDT(df)[, list(Visit=.N, Diff= 
as.numeric(abs(diff(as.Date(Date, format='%d-%b-%y') ,
 by = ID], ID+Visit~ Diff, value.var='Diff', length)

ID Visit 136 255 857
 1:  1 2   1   0   0
 2:  2 3   0   1   1





On Wednesday, February 11, 2015 5:47 PM, farnoosh sheikhi 
 wrote:



Hi Arun,

I have a data set that look s like below. I wanted to get a difference in dates 
for each unique ID and record it as a new X and have binary input for each one. 

ID   Date
106-Sep-13
120-Jan-14
206-Mar-12
225-Jun-11
229-Oct-13



For example for the first two date for ID=1 ( 20-Jan-14 - 06-Sep-13 ~ 121) and 
I want the data to be like follow:

ID  Visit   121
1   21
2   3 0


I really appreciate if you can help me with this. I know I need to write some 
kind of loop, but I don't know how to think of the logic behind it.
Thanks a lot.



Farnoosh

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 1st el of a list of vectors

2014-07-22 Thread arun
Or
rapply(l,function(x) x[1])
#[1] 1 3 7


set.seed(42)
 l1 <- replicate(1e6, list(sample(1:5,sample(8),replace=T)))
system.time(r1 <- sapply(l1, `[`, 1))
 #  user  system elapsed 
 # 1.324   0.000   1.326 

system.time(r2 <- rapply(l1, function(x) x[1]))
#   user  system elapsed 
#  0.736   0.004   0.741 

identical(r1,r2)
#[1] TRUE

system.time({
eltlens <- elementLengths(l1)
 r3 <- unlist(l1, use.names=FALSE)[cumsum(eltlens) - eltlens + 1L]
})
# user  system elapsed 
#  0.153   0.000   0.154 


A.K.


On Tuesday, July 22, 2014 12:11 AM, Richard M. Heiberger  
wrote:
l = list(c(1,2), c(3,5,6), c(7))

sapply(l, `[`, 1)

On Mon, Jul 21, 2014 at 3:55 PM, carol white  wrote:
> Hi,
> If we have a list of vectors of different lengths, how is it possible to 
> retrieve the first element of the vectors of the list?
>
>
> l = list(c(1,2), c(3,5,6), c(7))
>
> 1,3,7 should be retrieved
>
> Thanks
>
> Carol
>         [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] filter one entry, in dependence of date

2014-07-23 Thread arun
Hi,
If `dat` is the dataset:
dat[!(dat$ID==2 & as.numeric(gsub("-.*","",dat$Month))<5),]
  ID   Month Value
1  1 03-2014 1
2  1 04-2014    10
3  1 05-2014    50
6  2 05-2014 4
7  2 06-2014 2

A.K.



hello together, i have a short question, maybe you can help me.

I have a data.frame like this one
 ID    Month   Value
1    1 03-2014    1
2    1 04-2014    10
3    1 05-2014    50
4    2 03-2014 8
5    2 04-2014 7
6    2 05-2014 4
7    2 06-2014 2

I now want to create another data.frame without the lines from ID==2 which are 
earlier than 05-2014

The solution look like this one:

 ID    Month   Value
1    1 03-2014    1
2    1 04-2014    10
3    1 05-2014    50
4    2 05-2014 4
5    2 06-2014 2

maybe you can help me.

Best regards. Mat 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] corresponding replicated el of one matrix in another matrix or vector

2014-07-23 Thread arun
Try:
rbind(v2,unname(setNames(v1[,1],v1[,2])[v2]))
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
v2 "a"  "a"  "a"  "a"  "a"  "c"  "c"  "c"  "c"  "c"   "c"   "c"   "c"   "c"  
   "1"  "1"  "1"  "1"  "1"  "3"  "3"  "3"  "3"  "3"   "3"   "3"   "3"   "3"  
   [,15] [,16] [,17] [,18]
v2 "c"   "b"   "b"   "b"  
   "3"   "2"   "2"   "2"  
A.K.



Hi,
I have a matrix of unique elements (strings) like v1 and a vector which 
contains replicated values of the 2nd column of the first matrix.

v1 = cbind(c("1","2","3"),c("a","b","c"))

v2 = c(rep("a",5), rep("c",10), rep("b",3))

How can I add a column to v2 that contains the values of the first column of 
the first matrix v1 where the 2nd column of v1 matches the values of v2? Do I 
need to grep by looping over the nrow of v1 which is very time consuming or is 
there a better solution?

the results should be the same as


v3=rbind( c(rep("a",5), rep("c",10), rep("b",3)), c(rep("1",5), rep("3",10), 
rep("2",3)))

---
v1
 [,1] [,2] [,3]
[1,] "1"  "2"  "3"
[2,] "a"  "b"  "c"
> v2
 [1] "a" "a" "a" "a" "a" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "b" "b" "b"
> v3
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] "a"  "a"  "a"  "a"  "a"  "c"  "c"  "c"  "c"  "c"   "c"   "c"   "c"   "c" 
[2,] "1"  "1"  "1"  "1"  "1"  "3"  "3"  "3"  "3"  "3"   "3"   "3"   "3"   "3" 
 [,15] [,16] [,17] [,18]
[1,] "c"   "b"   "b"   "b" 
[2,] "3"   "2"   "2"   "2"  


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] is.na() == TRUE for POSIXlt time / date of "2014-03-09 02:00:00"

2014-07-30 Thread arun
Not able to reproduce the problem.
str(q)
# POSIXlt[1:1], format: "2014-03-09 02:00:00"
 is.na(q)
#[1] FALSE
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
A.K.




On Wednesday, July 30, 2014 1:10 PM, John McKown  
wrote:
"I'm so confused!" Why does is.na() report TRUE for a POSIXlt date &
time of 2014-03-09 02:00:00 ?

> q
[1] "2014-03-09 02:00:00"
> is.na(q)
[1] TRUE
> as.POSIXct(q)
[1] NA
> dput(q)
structure(list(sec = 0, min = 0L, hour = 2, mday = 9L, mon = 2L,
    year = 114L, wday = 0L, yday = 67L, isdst = 0L, zone = "",
    gmtoff = NA_integer_), .Names = c("sec", "min", "hour", "mday",
"mon", "year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt",
"POSIXt"))
> str(q)
POSIXlt[1:1], format: "2014-03-09 02:00:00"
>


-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] DATA SUMMARIZING and REPORTING

2014-07-30 Thread arun
For the example, you gave:

x ##dataset

indx <- t(sapply(min(x$MTH_SUPPORT):(max(x$MTH_SUPPORT) - 2), function(x) c(x, 
x + 
    2)))

res <- do.call(rbind, apply(indx, 1, function(.indx) {
    x1 <- x[x$MTH_SUPPORT >= .indx[1] & x$MTH_SUPPORT <= .indx[2], ]
    Period <- paste(.indx[1], .indx[2], sep = "-")
    No.ofChange <- sum(x1$ATT_1[-1] != x1$ATT_1[-length(x1$ATT_1)])
    Paid = with(x1, sum(A3)/(sum(A1) + sum(A2)))
    data.frame(ID_CASE = x$ID_CASE[1L], Period, No.ofChange, Paid, 
stringsAsFactors = F)
}))


 res
  ID_CASE    Period No.ofChange  Paid
1   CB26A 201302-201304   2 0.4143646
2   CB26A 201303-201305   2 0.4452450
3   CB26A 201304-201306   1 0.444
4   CB26A 201305-201307   2 0.4607407
5   CB26A 201306-201308   1 0.4617737
6   CB26A 201307-201309   1 0.4513274
7   CB26A 201308-201310   1 0.4613779


With multiple ID_CASE, either split the dataset by ID_CASE or on the grouping 
functions before applying this.


A.K.




On Wednesday, July 30, 2014 8:48 AM, Abhinaba Roy  
wrote:
Hi R-helpers,

I have dataframe like

  ID_CASE         YEAR_MTH       ATT_1             A1              A2
A3  CB26A 201302 1 146 42 74  CB26A 201302 0 140 50 77  CB26A 201303 0 128
36 77  CB26A 201304 1 146 36 72  CB26A 201305 1 134 36 80  CB26A 201305 0
148 30 80  CB26A 201306 0 134 20 72  CB26A 201307 1 125 48 79  CB26A 201309
0 122 44 74  CB26A 201310 1 126 37 72  CB26A 201310 1 107 43 75
I want a final dataframe which will look like

  ID_CASE Period  No.ofChange      %Paid  CB26A 201302-2013042  0.414365
CB26A 201303-201305 2 0.445245  CB26A 201304-201306 1 0.44  CB26A
201305-201307 2 0.460741  CB26A 201306-201308 1 0.461774  CB26A
201307-201309 1 0.451327  CB26A 201308-201310 1 0.461378
where,
Period = a time period of 3 months which is shifted by 1 month subsequently

No.ofChange = number of time ATT_1 has changed values in this period

%Paid = sum(A3)/(sum(A1)+sum(A2)) for this period
E.g. for Period=201302-201304,
%Paid = (74+77+77+72)/((146+140+128+146)+(42+50+36+36))

Period calculation should start from the first YEAR_MTH for the ID_CASE,
i.e., if for a ID_CASE first YEAR_MTH is 201301 or 201304 then the period
should be defined accordingly.

I have a dataframe with 400 unique ID_CASE, I need to do it for all ID_CASE.

How can I do it in R?

Regards,
Abhinaba

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] DATA SUMMARIZING and REPORTING

2014-07-30 Thread arun


With >1 ID_CASE, you may try:

xN <- x
xN$ID_CASE <- "CB27A" #creating another ID_CASE, other data same
x <- rbind(x, xN)
res1 <- do.call(rbind, lapply(split(x, x$ID_CASE), function(.x) {
    indx <- with(.x, t(sapply(min(MTH_SUPPORT):(max(MTH_SUPPORT) - 2), 
function(y) c(y, 
    y + 2
    do.call(rbind, apply(indx, 1, function(.indx) {
    x1 <- .x[with(.x, MTH_SUPPORT >= .indx[1] & MTH_SUPPORT <= .indx[2]), ]
    Period <- paste(.indx[1], .indx[2], sep = "-")
    x2 <- within(x1, {
    Paid <- sum(A3)/(sum(A1) + sum(A2))
    No.ofChange <- sum(ATT_1[-1] != ATT_1[-length(ATT_1)])
    })
    data.frame(ID_CASE = .x$ID_CASE[1L], Period, No.ofChange = 
x2$No.ofChange[1L], 
    Paid = x2$Paid[1L], stringsAsFactors = F)
    }))
}))

row.names(res1) <- 1:nrow(res1)
> res1
   ID_CASE    Period No.ofChange  Paid
1    CB26A 201302-201304   2 0.4143646
2    CB26A 201303-201305   2 0.4452450
3    CB26A 201304-201306   1 0.444
4    CB26A 201305-201307   2 0.4607407
5    CB26A 201306-201308   1 0.4617737
6    CB26A 201307-201309   1 0.4513274
7    CB26A 201308-201310   1 0.4613779
8    CB27A 201302-201304   2 0.4143646
9    CB27A 201303-201305   2 0.4452450
10   CB27A 201304-201306   1 0.444
11   CB27A 201305-201307   2 0.4607407
12   CB27A 201306-201308   1 0.4617737
13   CB27A 201307-201309   1 0.4513274
14   CB27A 201308-201310   1 0.4613779
A.K.




On Thursday, July 31, 2014 12:34 AM, arun  wrote:
For the example, you gave:

x ##dataset

indx <- t(sapply(min(x$MTH_SUPPORT):(max(x$MTH_SUPPORT) - 2), function(x) c(x, 
x + 
    2)))

res <- do.call(rbind, apply(indx, 1, function(.indx) {
    x1 <- x[x$MTH_SUPPORT >= .indx[1] & x$MTH_SUPPORT <= .indx[2], ]
    Period <- paste(.indx[1], .indx[2], sep = "-")
    No.ofChange <- sum(x1$ATT_1[-1] != x1$ATT_1[-length(x1$ATT_1)])
    Paid = with(x1, sum(A3)/(sum(A1) + sum(A2)))
    data.frame(ID_CASE = x$ID_CASE[1L], Period, No.ofChange, Paid, 
stringsAsFactors = F)
}))


 res
  ID_CASE    Period No.ofChange  Paid
1   CB26A 201302-201304   2 0.4143646
2   CB26A 201303-201305   2 0.4452450
3   CB26A 201304-201306   1 0.444
4   CB26A 201305-201307   2 0.4607407
5   CB26A 201306-201308   1 0.4617737
6   CB26A 201307-201309   1 0.4513274
7   CB26A 201308-201310   1 0.4613779


With multiple ID_CASE, either split the dataset by ID_CASE or on the grouping 
functions before applying this.


A.K.







On Wednesday, July 30, 2014 8:48 AM, Abhinaba Roy  
wrote:
Hi R-helpers,

I have dataframe like

  ID_CASE         YEAR_MTH       ATT_1             A1              A2
A3  CB26A 201302 1 146 42 74  CB26A 201302 0 140 50 77  CB26A 201303 0 128
36 77  CB26A 201304 1 146 36 72  CB26A 201305 1 134 36 80  CB26A 201305 0
148 30 80  CB26A 201306 0 134 20 72  CB26A 201307 1 125 48 79  CB26A 201309
0 122 44 74  CB26A 201310 1 126 37 72  CB26A 201310 1 107 43 75
I want a final dataframe which will look like

  ID_CASE Period  No.ofChange      %Paid  CB26A 201302-2013042  0.414365
CB26A 201303-201305 2 0.445245  CB26A 201304-201306 1 0.44  CB26A
201305-201307 2 0.460741  CB26A 201306-201308 1 0.461774  CB26A
201307-201309 1 0.451327  CB26A 201308-201310 1 0.461378
where,
Period = a time period of 3 months which is shifted by 1 month subsequently

No.ofChange = number of time ATT_1 has changed values in this period

%Paid = sum(A3)/(sum(A1)+sum(A2)) for this period
E.g. for Period=201302-201304,
%Paid = (74+77+77+72)/((146+140+128+146)+(42+50+36+36))

Period calculation should start from the first YEAR_MTH for the ID_CASE,
i.e., if for a ID_CASE first YEAR_MTH is 201301 or 201304 then the period
should be defined accordingly.

I have a dataframe with 400 unique ID_CASE, I need to do it for all ID_CASE.

How can I do it in R?

Regards,
Abhinaba

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question

2014-07-30 Thread arun
Hi Farnoosh,

Regarding the first question:

dat2 <- dat1
dat1$Mean <- setNames(unsplit(sapply(split(dat1[,-1], dat1[,1]),rowMeans, 
na.rm=T),dat1[,1]),NULL)
dat1
  Unit q1 q2 q3 Mean
1    A  3  1  2 2.00
2    A  2 NA  1 1.50
3    B  2  2  4 2.67
4    B NA  2  5 3.50
5    C  3  2 NA 2.50
6    C  4  1  4 3.00
7    A  3  2 NA 2.50


second question, is not clear.  Assuming that you want something like this:
 dat2[,-1] <- (!is.na(dat2[,-1]))+0
 dat2$indx <- with(dat2, ave(rep(1, nrow(dat2)), Unit, FUN=cumsum))
 library(reshape2)
dcast(melt(dat2, id.var=c("indx","Unit")), variable+indx~Unit, 
value.var="value", fill=0)[,-2]
  variable A B C
1   q1 1 1 1
2   q1 1 0 1
3   q1 1 0 0
4   q2 1 1 1
5   q2 0 1 1
6   q2 1 0 0
7   q3 1 1 0
8   q3 1 1 1
9   q3 0 0 0



A.K.



On Wednesday, July 30, 2014 1:42 PM, farnoosh sheikhi  
wrote:



Hi Arun,


I have two questions, I have a data like below:

dat1<-read.table(text="
Unit  q1q2q3
A312
A2NA1
B224
BNA25
C32NA
C414
A32NA
",sep="",header=T,stringsAsFactors=F)

I want to get the average of each row by the number of answered questions. For 
example second row would be (2+1)/2 since there is a NA.

Secondly, I want to pivot the units like: UnitA, UnitB, Unit C  as columns and 
have 1 and zero as values.

Thanks a lot for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] separate numbers from chars in a string

2014-07-30 Thread arun
If you have some variations of the order of numbers followed by chars,

library(stringr)

v1 <- c("absdfds0213451ab", "123abcs4145")
pattern=c("[A-Za-z]+", "\\d+")

do.call(`Map`,c(c,lapply(pattern, function(.pat) str_extract_all(v1, .pat
#[[1]]
#[1] "absdfds" "ab"  "0213451"

#[[2]]
#[1] "abcs" "123"  "4145"
A.K.



Hi,
If I have a string of consecutive chars followed by consecutive numbers and 
then chars, like "absdfds0213451ab", how to separate the consecutive chars from 
consecutive numbers?

grep doesn't seem to be helpful

grep("[a-z]","absdfds0213451ab", ignore.case=T)
[1] 1


 grep("[0-9]","absdfds0213451ab", ignore.case=T)
[1] 1

Thanks

Carol 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex - subsetting parts of a file name.

2014-07-31 Thread arun
Try:
gsub(".*\\.(.*)\\..*","\\1", my.cache.list)
[1] "subject_test"  "subject_train" "y_test"    "y_train" 

#or

library(stringr)
str_extract(my.cache.list, perl('(?<=\\.).*(?=\\.)'))
[1] "subject_test"  "subject_train" "y_test"    "y_train"  

A.K.




On Thursday, July 31, 2014 11:05 AM, arnaud gaboury  
wrote:
A directory is full of data.frames cache files. All these files have
the same pattern:

df.some_name.RData

my.cache.list <- c("df.subject_test.RData", "df.subject_train.RData",
"df.y_test.RData",
"df.y_train.RData")

I want to keep only the part inside the two points. After lots of
headache using grep() when trying something like this:

grep('.(.*?).','df.subject_test.RData',value=T)

I couldn't find a clean one liner and found this workaround:

my.cache.list <- gsub('df.','',my.cache.list)
my.cache.list <- gsub('.RData','',my.cache.list)

The two above commands do the trick, but a clean one line with some
regex expression would be a more "elegant" way.

Does anyone have any suggestion ?

TY for help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract word before /// in a data frame contain many thousands rows.

2014-07-31 Thread arun
Try:
If dat is the dataset.


   library(stringr)
    res <- str_extract(dat$Gene.Symbol, perl('[[:alnum:]]+(?= \\/)'))
 res[!is.na(res)]
 #[1] "CDH23"

A.K.




On Thursday, July 31, 2014 9:54 PM, Stephen HK Wong  wrote:
Dear All,

I appreciate if you can help me out this. I have a data frame contains many 
thousand of rows, with some rows that has /// symbol,  as shown in in row 2, I 
want to extract word before ///, such as in this case, CDH23. Many thanks.
Probe.Set.ID            Gene.Symbol
1  1552301_a_at                  CORO6
2  1552436_a_at CDH23 /// LOC100653137
3  1552477_a_at                   IRF6
4  1552685_a_at                  GRHL1
5    1552742_at                  KCNH8
6  1552752_a_at                  CADM2
7    1552799_at                TSNARE1
8  1552897_a_at                  KCNG3
9  1552902_a_at                  FOXP2
10   1552903_at               B4GALNT2


structure(list(Probe.Set.ID = c("1552301_a_at", "1552436_a_at", 
"1552477_a_at", "1552685_a_at", "1552742_at", "1552752_a_at", 
"1552799_at", "1552897_a_at", "1552902_a_at", "1552903_at"), 
    Gene.Symbol = c("CORO6", "CDH23 /// LOC100653137", "IRF6", 
    "GRHL1", "KCNH8", "CADM2", "TSNARE1", "KCNG3", "FOXP2", "B4GALNT2"
    )), .Names = c("Probe.Set.ID", "Gene.Symbol"), row.names = c(NA, 
10L), class = "data.frame")


Stephen HK Wong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to transform the data frame into the list?

2014-08-01 Thread arun
Use ?split()
split(dat[,-4], dat$Year_Month) #dat is the dataset.

A.K.


   Country  Product   Price  Year_Month
 AE 1   20    201204
 DE 1   20    201204
 CN 1   28    201204
 AE 2   28    201204
 DE 2   28    201204
 CN 2   22    201204
 AE 3   28    201204
 CN 3   28    201204
 AE 1   20    201205
 DE 1   20    201205
 CN 1   28    201205
 AE 2   28    201205
 DE 2   28    201205

How to create the list? which has:
[[201204]]
  Country  Product   Price  
 AE 1   20    
 DE 1   20  
 CN 1   28    
 AE 2   28    
 DE 2   28    
 CN 2   22    
 AE 3   28  
 CN 3   28    

[[201205]]
  Country  Product   Price  
 AE 1   20  
 DE 1   20  
 CN 1   28    
 AE 2   28    
 DE 2   28 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Better use with gsub

2014-08-01 Thread arun


You could try:
library(stringr)
  
simplify2array(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+')))
 [,1] [,2] [,3]  [,4]  [,5]  [,6] 
[1,] "24" "24" "24"  "24"  "24"  "24" 
[2,] "57" "86" "119" "129" "138" "163"
A.K.

On Friday, August 1, 2014 10:49 AM, "Doran, Harold"  wrote:
I have done an embarrassingly bad job using a mixture of gsub and strsplit to 
solve a problem. Below is sample code showing what I have to start with (the 
vector xx) and I want to end up with two vectors x and y that contain only the 
digits found in xx.

Any regex users with advice most welcome

Harold

xx <- c("S24:57",   "S24:86",   "S24:119",  "S24:129",  "S24:138",  "S24:163")
yy <- gsub("S","\\1", xx)
a1 <- gsub(":"," ", yy)
a2 <- sapply(a1, function(x) strsplit(x, ' '))
x <- as.numeric(sapply(a2, function(x) x[1]))
y <- as.numeric(sapply(a2, function(x) x[2]))

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Better use with gsub

2014-08-01 Thread arun
Forgot about as.numeric.

 sapply(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+')),as.numeric)
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   24   24   24   24   24   24
[2,]   57   86  119  129  138  163






On Friday, August 1, 2014 10:59 AM, arun  wrote:


You could try:
library(stringr)
  
simplify2array(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+')))
 [,1] [,2] [,3]  [,4]  [,5]  [,6] 
[1,] "24" "24" "24"  "24"  "24"  "24" 
[2,] "57" "86" "119" "129" "138" "163"
A.K.




On Friday, August 1, 2014 10:49 AM, "Doran, Harold"  wrote:
I have done an embarrassingly bad job using a mixture of gsub and strsplit to 
solve a problem. Below is sample code showing what I have to start with (the 
vector xx) and I want to end up with two vectors x and y that contain only the 
digits found in xx.

Any regex users with advice most welcome

Harold

xx <- c("S24:57",   "S24:86",   "S24:119",  "S24:129",  "S24:138",  "S24:163")
yy <- gsub("S","\\1", xx)
a1 <- gsub(":"," ", yy)
a2 <- sapply(a1, function(x) strsplit(x, ' '))
x <- as.numeric(sapply(a2, function(x) x[1]))
y <- as.numeric(sapply(a2, function(x) x[2]))

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining Rows from One Data Frame, Outputting into Another

2014-08-01 Thread arun
You could use:

    library(dplyr)
    library(tidyr)
  x.df %>% group_by(Year, Group, Eye_Color) %>% summarize(n=n()) %>% 
spread(Eye_Color,n, fill=0)
Source: local data frame [6 x 5]

  Year Group blue brown green
1 2000 1    2 1 0
2 2000 2    0 0 2
3 2001 1    1 0 0
4 2001 2    1 1 0
5 2001 3    1 0 0
6 2002 1    1 0 0



Or

library(reshape2)
dcast(x.df, Year+Group~Eye_Color, value.var="Eye_Color")
A.K.


On Friday, August 1, 2014 7:06 PM, Kathy Haapala  wrote:
If I have a dataframe x.df as follows:
> x.df <- data.frame(Year = c(2000, 2000, 2000, 2000, 2000, 2001, 2001,
2001, 2001, 2002), Group = c(1, 1, 1, 2, 2, 1, 2, 2, 3, 1), Eye_Color =
c("blue", "blue", "brown", "green", "green", "blue", "brown", "blue",
"blue", "blue"))

> x.df
   Year Group Eye_Color
1  2000     1      blue
2  2000     1      blue
3  2000     1     brown
4  2000     2     green
5  2000     2     green
6  2001     1      blue
7  2001     2     brown
8  2001     2      blue
9  2001     3      blue
10 2002     1      blue

how can I turn it into a new dataframe that would take the data from
multiple rows of Year/Group combinations and output the data into one row
for each combination, like this:
> x_new.df
  Year Group No_blue No_brown No_green
1 2000     1       2        1        0
2 2000     2       0        0        2
3 2001     1       1        0        0
4 2001     2       1        1        0
5 2001     3       1        0        0
6 2002     1       1        0        0

I've been trying to use for loops, but I'm wondering if anyone has a better
or more simple suggestion.

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare data in two rows and replace objects in data frame

2014-08-04 Thread arun
You could try data.table

#dat is the dataset


library(data.table)
v1 <- setNames(c("HT", "A", "B", "Aht", "Bht"), c("11", "10", "01", "1-", "-1"))
dat2 <- setDT(dat1)[, lapply(.SD, function(x) v1[paste(x, collapse="")]), 
by=CloneID]

A.K.




On Monday, August 4, 2014 5:55 AM, raz  wrote:
Dear all,

I have a data frame 144 x 2 values.
I need to take every value in the first row and compare to the second row,
and the same for rows 3-4 and 5-6 and so on.
the output should be one line for each of the two row comparison.
the comparison is:
if row1==1 and row2==1 <-'HT'
if row1==1 and row2==0 <-'A'
if row1==0 and row2==1 <-'B'
if row1==1 and row2=='-' <-'Aht'
if row1=='-' and row2==1 <-'Bht'

for example:
if the data is:
CloneID    genotype 2001    genotype 2002    genotype 2003
2471250    1    1    1
2471250    0    0    0
2433062    0    0    0
2433062    1    1    1
100021605    1    1    0
100021605    1    0    1
15599    1    1    0
15599    1    1    1
12798    1    1    0
12798    1    1    1

then the output should be:
CloneID    genotype 2001    genotype 2002    genotype 2003
2471250    A    A    A
2433062    B    B    B
100021605    HT    A    B
15599    HT    HT    B
12798    HT    HT    B

I tried this for the whole data, but its so slow:

AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)


for (i in seq(1,nrow(AX),by=2)){
for (j in 6:144){
if (AX[i,j]==1 & AX[i+1,j]==0){
AX[i,j]<-'A'
}
if (AX[i,j]==0 & AX[i+1,j]==1){
AX[i,j]<-'B'
}
if (AX[i,j]==1 & AX[i+1,j]==1){
AX[i,j]<-'HT'
}
if (AX[i,j]==1 & AX[i+1,j]=="-"){
AX[i,j]<-'Aht'
}
if (AX[i,j]=="-" & AX[i+1,j]==1){
AX[i,j]<-'Bht'
}
}
}

AX1<-AX[!duplicated(AX[,3]),]
AX2<-AX[duplicated(AX[,3]),]

Thanks for any help,

Raz



-- 
\m/

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract descriptive stats for categorial data from dataframe

2014-08-05 Thread arun
You could try:
lv <- levels(unique(unlist(df)))
as.data.frame(t(apply(df, 2, function(x) table(factor(x, levels=lv)
    +  - 0
i1 10  0 0
i2 10  0 0
i3  0 10 0
i4  0  9 1
i5 10  0 0
i6  1  9 0
i7  9  0 1
i8  4  2 4
i9  7  1 2
A.K.




On Tuesday, August 5, 2014 5:36 AM, Alain D.  wrote:
Dear R-List,

I want to have descriptive stats in a special form and cannot figure out a nice
solution.

df<-as.data.frame(cbind(i1=rep("+"),i2=rep("+",10),i3=rep("-",10),i4=c(rep("-",2),"0",rep("-",7)),i5=rep("+",10),i6=c(rep("-",9),"+"),i7=c(rep("+",4),"0",rep("+",5)),i8=c(rep(0,4),rep("+",3),"-","+","-"),i9=c(rep("+",5),"-",rep("+",2),rep(0,2

now I want the categories as var labels arranged in cols with IDs as first col
and then frequencies for each category. Something like this:

var   +   -    0
i1    10  0    0
i2    10  0    0
i3     0 10    0
i4     0  9    1
i5    10  0    0
i6     1  9    0
i7     9  0    1
i8     4  2    4
i9     7  1    2

I tried different combinations of

freq<-as.data.frame(df<-lapply(df,table))

but was not very successful.

I would be very thankful for an easy solution which is probably to obvious for
me to spot.

Thank you very much.

Best wishes

Alain
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] populating matrix with binary variable after matching data from data frame

2014-08-12 Thread arun
You could try:
x1$V2[1] <- "TCLA1"


  x[outer(rownames(x), colnames(x), FUN=paste) %in% 
as.character(interaction(x1, sep=" "))] <- 1
x
   TCLA1 VPS41 ABCA13 ABCA4
AKT3   1 0  0 0
AKTIP  0 1  0 0
ABCA13 0 0  0 0
ABCA4  0 0  0 0
A.K.


On Tuesday, August 12, 2014 8:16 PM, Adrian Johnson  
wrote:
Hi:
sorry I have a basic question.

I have a data frame with two columns:
> x1
      V1       V2
1   AKT3    TCL1A
2  AKTIP    VPS41
3  AKTIP    PDPK1
4  AKTIP   GTF3C1
5  AKTIP    HOOK2
6  AKTIP    POLA2
7  AKTIP KIAA1377
8  AKTIP FAM160A2
9  AKTIP    VPS16
10 AKTIP    VPS18


I have a matrix 1211x1211 (using some elements in x1$V1 and some from
x1$V2). I want to populate for every match for example AKT3 = TCL1A = 1
whereas AKT3 - VPS41 gets 0)
How can i map this binary relations in x.


>x
       TCLA1 VPS41 ABCA13 ABCA4
AKT3       0     0      0     0
AKTIP      0     0      0     0
ABCA13     0     0      0     0
ABCA4      0     0      0     0


dput -

x = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim =
c(4L,
4L), .Dimnames = list(c("AKT3", "AKTIP", "ABCA13", "ABCA4"
), c("TCLA1", "VPS41", "ABCA13", "ABCA4")))

x1 = structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP",
"AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A",
"VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2",
"VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA,
10L), class = "data.frame")



Thanks
Adrian

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to avoid change string to number automaticlly in r

2014-08-15 Thread arun
A similar post was found in stackoverflow 
(http://stackoverflow.com/questions/25328311/how-to-avoid-change-string-to-number-automaticlly-in-r),
 which already got an accepted reply.

A.K.




On Friday, August 15, 2014 2:18 PM, Wenlan Tian  wrote:
I was trying to save some string into a matrix, but it automatically
changed to numbers (levels). How can i avoid it??

Here is the original table:

  trt    means  M1 0   12.16673  a2 111 11.86369 ab3 125 11.74433 ab4
14  11.54073  b

I wanna to save to a matrix like:
J0001 a ab ab b

But, what i get is:
J0001 1 2 2 3

How can i avoid this?

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex pattern assistance

2014-08-15 Thread arun


Hi Tom,
You could try:
library(stringr)
str_extract(x, perl("(?<=[A-Za-z]{4}/).*(?=/[0-9])"))
#[1] "S01-012"
A.K.



On Friday, August 15, 2014 12:20 PM, Tom Wright  wrote:
Hi,
Can anyone please assist.

given the string 

> x<-"/mnt/AO/AO Data/S01-012/120824/"

I would like to extract "S01-012"

require(stringr)
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+")
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+")

both nearly work. I expected I would use something like:
> str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+")

but I don't seem able to get the square bracket grouping to work
correctly. Can someone please show me where I am going wrong?

Thanks,
Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANY ONE HERE PLZ Urgent

2014-08-28 Thread arun
Try:

format(as.Date("05/07/2014", "%m/%d/%Y"), "%m")
#[1] "05"

#or
strptime("05/07/2014", "%m/%d/%Y")$mon+1
#[1] 5



A.K.


How to extract a Month from Date object?

almost 13 peoples visited my Question with out replying in New to R , i have 
task yaar



don't mind plz could you HELP ME

How to extract a Month from Date object?

as.month("05/07/2014", format = "%m")

tried wityh this

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] r convert current date format from y-m-d to m/d/y

2014-09-01 Thread arun
Hi,

Use  ?format

 format(d, "%m/%d/%Y")
#[1] "09/01/2014"

A.K.


On Monday, September 1, 2014 5:26 AM, Velappan Periasamy  
wrote:





d=Sys.Date()
"2014-09-01"

How to convert this "2014-09-01" to "09/01/2014" format?

(ie y-m-d to m/d/y format)

thanks

veepsirtt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generate sequence of date based on a group ID

2014-10-08 Thread arun


If the `ids` are ordered as shown in the example, perhaps you need

   tbl <- table(df$id)
   
   rep(seq(as.Date("2000-01-01"), length.out=length(tbl), by=1), tbl)
[1] "2000-01-01" "2000-01-01" "2000-01-01" "2000-01-01" "2000-01-01"
[6] "2000-01-02" "2000-01-02" "2000-01-02" "2000-01-02" "2000-01-02"
[11] "2000-01-03" "2000-01-03" "2000-01-03" "2000-01-03" "2000-01-03"
[16] "2000-01-04" "2000-01-04" "2000-01-04" "2000-01-04" "2000-01-05"
[21] "2000-01-05" "2000-01-05" "2000-01-05"

A.K.


On Wednesday, October 8, 2014 3:57 AM, Kuma Raj  wrote:



I want to generate a sequence of date based on a group id(similar IDs
should have same date). The id variable contains unequal observations
and the length of the data set also varies.  How could I create a
sequence that starts on specific date (say January 1, 2000 onwards)
and continues until the end without specifying length?


Sample data follows:

df<-structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,

3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), out1 = c(0L,

0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,

0L, 1L, 0L, 0L, 0L, 1L)), .Names = c("id", "out1"), class =
"data.frame", row.names = c(NA,

-23L))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-11-09 Thread arun
You could try

library(dplyr)
data1 %>% 

  rowwise() %>%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: 

idmrjdatecocdateinhdatehaldateoldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   
3  3 2009-10-242011-10-132011-10-13
4  4 2007-10-10  2007-10-10
5  5 2006-09-01 2005-08-10   2006-09-01
6  6 2007-09-04 2011-10-05   2011-10-05
7  7 2005-10-25   2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" 
 wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column 
for id and 4 columns for dates), and there are 7 observations.  I would like to 
insert the most recent date from those 4 date columns into a new column 
(oiddate) using the mutate() function in the dplyr package.   I am getting 
correct results (NA in the new column) if a given row has all NA's in the four 
columns.  However, the issue is that the date value inserted into the new 
column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA 
value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see 
the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





##  from the console 

print (data2)

  idmrjdatecocdateinhdatehaldateoidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2   

3  3 2009-10-242011-10-132011-11-04

4  4 2007-10-10  2011-11-04

5  5 2006-09-01 2005-08-10   2011-11-04

6  6 2007-09-04 2011-10-05   2011-11-04

7  7 2005-10-25   2011-11-04 2011-11-04





##  Reproducible code and data 
#



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1 2004-11-04 2008-07-18 2005-07-07 2007-11-07

2 NA NA NA NA

3 2009-10-24 NA 2011-10-13 NA

4 2007-10-10 NA NA NA

5 2006-09-01 2005-08-10 NA NA

6 2007-09-04 2011-10-05 NA NA

7 2005-10-25 NA NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

colClasses=c("character", "Date", "Date", "Date", "Date"),

header=TRUE, as.is=TRUE

)

# create a new column



data2 <- mutate(data1,

oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & 
is.na(inhdate)  & is.na(haldate), NA,

  max(mrjdate, cocdate, inhdate, 
haldate,na.rm=TRUE )

)

)



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-11-09 Thread arun


Dear Pradip,

>From the documentation of ?max: 


   The minimum and maximum of a numeric empty set are ‘+Inf’ and
‘-Inf’ 

One of the rows in your dataset is all `NAs.`  I am not sure you want to keep 
that row with all NAs.  You could remove it and run the code or keep it and run 
with that warning.

data1 <- data1[rowSums(is.na(data1[,-1]))!=4,]

data1 %>% 

  rowwise()%>%
  mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, 
na.rm=TRUE), origin='1970-01-01')


A.K.
On Sunday, November 9, 2014 9:16 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" 
 wrote:



Dear Arun,

Thank you so much for sending me the dplyr/mutate() solution to my code.
But,  I am getting the following warning message.  Any suggestions on how to 
avoid this message?

Pradip

Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


#
data1 %>% 
+  
+   rowwise() %>%
+   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
+  na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: 

  idmrjdatecocdateinhdatehaldateoldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   
3  3 2009-10-242011-10-132011-10-13
4  4 2007-10-10  2007-10-10
5  5 2006-09-01 2005-08-10   2006-09-01
6  6 2007-09-04 2011-10-05   2011-10-05
7  7 2005-10-25   2011-11-04 2011-11-04
Warning message:
In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf


Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-

Sent: Sunday, November 09, 2014 7:00 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

You could try

library(dplyr)
data1 %>% 

  rowwise() %>%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), origin='1970-01-01'))
Source: local data frame [7 x 6]
Groups: 

idmrjdatecocdateinhdatehaldateoldflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   
3  3 2009-10-242011-10-132011-10-13
4  4 2007-10-10  2007-10-10
5  5 2006-09-01 2005-08-10   2006-09-01
6  6 2007-09-04 2011-10-05   2011-10-05
7  7 2005-10-25   2011-11-04 2011-11-04

A.K.


On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" 
 wrote:
Hello,



The example data frame in the reproducible code below has 5 columns (1 column 
for id and 4 columns for dates), and there are 7 observations.  I would like to 
insert the most recent date from those 4 date columns into a new column 
(oiddate) using the mutate() function in the dplyr package.   I am getting 
correct results (NA in the new column) if a given row has all NA's in the four 
columns.  However, the issue is that the date value inserted into the new 
column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA 
value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see 
the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





##  from the console 

print (data2)

  idmrjdatecocdateinhdatehaldateoidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2   

3  3 2009-10-242011-10-132011-11-04

4  4 2007-10-10  2011-11-04

5  5 2006-09-01 2005-08-10   2011-11-04

6  6 2007-09-04 2011-10-05   2011-11-04

7  7 2005-10-25   2011-11-04 2011-11-04





##  Reproducible code and data 
#



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1 2004-11-04 2008-07-18 2005-07-07 2007-11-07

2 NA NA NA NA

3 2009-10-24 NA 2011-10-13 NA

4 2007-10-10 NA NA NA

5 2006-09-01 2005-08-10 NA NA

6 2007-09-04 2011-10-05 NA NA

7 2005-10-25 NA NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

colClasses=c("character", "Date", "Date", "Date", "Date"),

 

Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

2014-11-10 Thread arun
Try

range(data2$oiddate[complete.cases(data2$oiddate) & is.finite(data2$oiddate)])
#[1] "2006-09-01" "2011-11-04"



If you look at the `dput` output, it is `Inf` for oiddate
dput(data2$oiddate)
structure(c(14078, -Inf, 15260, 13796, 13392, 15252, 15282), class = "Date")

   

A.K.

On Monday, November 10, 2014 11:15 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" 
 wrote:
Hello,

The range() with complete.cases() removes NA's for the date variables that are 
read from a data frame.  However, the issue is that the same function does not 
remove NA's for the other date variable that is created using the 
dplyr/mutate().  The console and the reproducible example are given below. Any 
advice how to resolve this issue would be appreciated.

Thanks,

Pradip Muhuri


#  cut and pasted from the R console 

idmrjdatecocdateinhdatehaldateoiddate
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   
3  3 2009-10-242011-10-132011-10-13
4  4 2007-10-10  2007-10-10
5  5 2006-09-01 2005-08-10   2006-09-01
6  6 2007-09-04 2011-10-05   2011-10-05
7  7 2005-10-25   2011-11-04 2011-11-04
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data2$cocdate[complete.cases(data2$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data2$inhdate[complete.cases(data2$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data2$haldate[complete.cases(data2$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data2$oiddate[complete.cases(data2$oiddate)])
[1] NA   "2011-11-04"


  reproducible code #

library(dplyr)
library(lubridate)
library(zoo)
# data object - description of the

temp <- "id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04"

# read the data object

data1 <- read.table(textConnection(temp),
colClasses=c("character", "Date", "Date", "Date", "Date"),
header=TRUE, as.is=TRUE
)


# create a new column

data2 <- data1 %>%
 rowwise() %>%
  mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), 
origin='1970-01-01'))

# print records

print (data2)

# range of dates

range(data2$mrjdate[complete.cases(data2$mrjdate)])
range(data2$cocdate[complete.cases(data2$cocdate)])
range(data2$inhdate[complete.cases(data2$inhdate)])
range(data2$haldate[complete.cases(data2$haldate)])
range(data2$oiddate[complete.cases(data2$oiddate)])





Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting multiple rows of a data frame at once

2013-07-03 Thread arun

Hi,
Try this:

set.seed(24)
df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= 
sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5))

#Used a shorter vector 
x1<- c(1.05,2.85,3.40,4.25,0.25)
y1<- c(0.25,0.10,0.90,0.25,1.05)

res<-do.call(rbind,lapply(seq_along(x1),function(i) 
subset(df,x==x1[i]&y==y1[i])))
head(res,2)
#    x    y  z
#466  1.05 0.25  0.7865224
#4119 1.05 0.25 -1.5679096
 tail(res,2)
# x    y  z
#98120 0.25 1.05 -2.1239596
#98178 0.25 1.05  0.3321464


A.K.

Hi Everyone, 

First time poster so any posting rules i should know about feel free to 
advise... 

I've got a data frame of 250 000 rows in columns of x y and z. 

i need to extract 20-30 rows from the data frame with specific x
 and y values, such that i can find the z value that corresponds. There 
is no repeated data. (its actually 250 000 squares in a 5x5m grid) 

to find them individually i can use subset successfully 

result<-subset(df,x==1.05 & y==c0.25) 

gives me the row in the dataframe with that x and y value. 

so if i have 

x = 1.05 2.85 3.40 4.25 0.25 3.05 3.70 0.20 0.30 0.70 1.05 1.20 
1.40 1.90 2.70 3.25 3.55 4.60 2.05 2.15 3.70 4.85 4.90 1.60 2.45 3.20 
3.90 4.45 

and 

y= 0.25 0.10 0.90 0.25 1.05 1.70 2.05 2.90 2.35 2.60 2.55 2.15 
2.75 2.05 2.70 2.25 2.55 2.05 3.65 3.05 3.00 3.50 3.75 4.85 4.50 4.50 
3.35 4.90 

then how can i retrieve the rows for all those values at once. 

if i name x=xt and y=yt and then 

result<-subset(df,x==xt & y==yt) 

then i get 

result 
[1] x      y      Height 
<0 rows> (or 0-length row.names) 

i dont understand why zero rows are selected. obviously im 
applying the vectors inappropriately, but i cant seem to find anything 
on this method of subsetting online. 

Thanks for any replies!   


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change cell values

2013-07-03 Thread arun
Hi,

set.seed(24) 
mat1=matrix(rnorm(12),3)
set.seed(28)
mat2=matrix(rnorm(12),3)
 indx<- mat1<1 & mat2<1
mat1[indx]<-NA
 mat2[indx]<-NA
 mat1
# [,1] [,2] [,3]    [,4]
#[1,]   NA   NA   NA 0.002311942
#[2,]   NA   NA   NA  NA
#[3,]   NA   NA   NA 0.598269113
 mat2
# [,1] [,2] [,3] [,4]
#[1,]   NA   NA   NA 1.841481
#[2,]   NA   NA   NA   NA
#[3,]   NA   NA   NA 1.520367
A.K.

- Original Message -
From: JiangZhengyu 
To: "r-help@r-project.org" 
Cc: 
Sent: Wednesday, July 3, 2013 5:27 PM
Subject: [R] change cell values




Dear R experts,

I have two  matrices (mat1 & mat2) with the same dimension & the cells (row and 
column) are corresponding to each other.

I want to change cell values to NA given values of the corresponding cells in 
mat1 and mat2 are both <1.

E.g. both mat1[2,3] and mat2[2,3] are <1, I will put mat1[2,3]=NA, and 
mat2[2,3]=NA; if either mat1[2,3]>=1 or  mat2[2,3]>=1, I will save both cells.

I tried the code, but not working. Could anyone can help fix the problem?

mat1[mat1<1&mat2<1]=NA
mat2[mat1<1&mat2<1]=NA


> mat1=matrix(rnorm(12),3)
> mat2=matrix(rnorm(12),3)
> mat1
           [,1]       [,2]       [,3]       [,4]
[1,] -1.3387075 -0.7142333 -0.5614211  0.1846955
[2,] -0.7936087 -0.2215797 -0.3686067  0.7328731
[3,]  0.6505082  0.1826019  1.5577883 -1.5580384
> mat2
           [,1]       [,2]       [,3]       [,4]
[1,]  0.4331573 -1.8086826 -1.7688123 -1.4278934
[2,] -0.1841451  0.1738648 -1.1086942  1.3065109
[3,] -1.0827245 -0.4143808 -0.6889405  0.4046203

                          
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] modify timestemp

2013-07-03 Thread arun
Hi,
May be this helps:

dat1# dataset
dat1[,2]<-gsub("\\d+$","00",dat1[,2])
 dat1
# Date Time
#1  01/01/2013 00:09:00
#2  01/02/2013 00:10:00
#3  01/03/2013 00:11:00
#4  01/04/2013 00:12:00
#5  01/05/2013 00:13:00
#6  01/06/2013 00:15:00
#7  01/07/2013 00:16:00
#8  01/08/2013 00:17:00
#9  01/09/2013 00:18:00
#10 01/10/2013 00:19:00
A.K.


Hey All, 

I want to standardize my timestamp which is formatted as hh:mm:ss 

 My data looks like this: 

     Date     Time 
01/01/2013 00:09:01 
01/02/2013 00:10:14 
01/03/2013 00:11:27 
01/04/2013 00:12:40 
01/05/2013 00:13:53 
01/06/2013 00:15:06 
01/07/2013 00:16:19 
01/08/2013 00:17:32 
01/09/2013 00:18:45 
01/10/2013 00:19:58 

Dataset <- structure(list(Date = c("01/01/2013", "01/02/2013", 
"01/03/2013", 
"01/04/2013", "01/05/2013", "01/06/2013", "01/07/2013", "01/08/2013", 
"01/09/2013", "01/10/2013"), Time = c("00:09:01", "00:10:14", 
"00:11:27", "00:12:40", "00:13:53", "00:15:06", "00:16:19", "00:17:32", 
"00:18:45", "00:19:58")), .Names = c("Date", "Time"), class = "data.frame", 
row.names = c(NA, 
-10L)) 

I would like to change all the records in "Time" column uniformed as 
hh:mm:00, then the output would be this: 

Date     Time 
01/01/2013 00:09:00 
01/02/2013 00:10:00 
01/03/2013 00:11:00 
01/04/2013 00:12:00 
01/05/2013 00:13:00 
01/06/2013 00:15:00 
01/07/2013 00:16:00 
01/08/2013 00:17:00 
01/09/2013 00:18:00 
01/10/2013 00:19:00 

Thanks for your help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting multiple rows of a data frame at once

2013-07-04 Thread arun
Hi,
Possibly, FAQ 7.31
Using the same example:
set.seed(24)
df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= 
sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5)) 
dfOld<- df
 df[,1:2]<- lapply(df[,1:2],function(x) sprintf("%.2f",x))
x1<- c(1.05,2.85,3.40,4.25,0.25)
y1<- c(0.25,0.10,0.90,0.25,1.05) 
 x1New<-sprintf("%.2f",x1)
 y1New<- sprintf("%.2f",y1)
res1<-do.call(rbind,lapply(seq_along(x1New),function(i) 
subset(df,x==x1New[i]&y==y1New[i]))) 

res<-do.call(rbind,lapply(seq_along(x1),function(i) 
subset(dfOld,x==x1[i]&y==y1[i]))) 
dim(res1)
#[1] 318   3
  dim(res)
#[1] 250   3
 res1[,1:2]<- lapply(res1[,1:2],as.numeric)
str(res1)
#'data.frame':    318 obs. of  3 variables:
# $ x: num  1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 ...
# $ y: num  0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
# $ z: num  0.787 -1.568 -1.626 -0.221 -0.7 ...
A.K.


nevermind error on my behalf got it going. 

I have another issue, it leaves some values out. ive seperately 
searched the df and theyre definitely in there... so it there some sort 
of exclusion rule? there are about 8 of the 28 missing... the first row 
missing is 3.05,1.70 . i looked up the documentation for subset but i 
cant see why it would skip ones... 

thanks 


- Original Message -
From: arun 
To: R help 
Cc: 
Sent: Wednesday, July 3, 2013 7:37 AM
Subject: Re: Subsetting multiple rows of a data frame at once


Hi,
Try this:

set.seed(24)
df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= 
sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5))

#Used a shorter vector 
x1<- c(1.05,2.85,3.40,4.25,0.25)
y1<- c(0.25,0.10,0.90,0.25,1.05)

res<-do.call(rbind,lapply(seq_along(x1),function(i) 
subset(df,x==x1[i]&y==y1[i])))
head(res,2)
#    x    y  z
#466  1.05 0.25  0.7865224
#4119 1.05 0.25 -1.5679096
 tail(res,2)
# x    y  z
#98120 0.25 1.05 -2.1239596
#98178 0.25 1.05  0.3321464


A.K.

Hi Everyone, 

First time poster so any posting rules i should know about feel free to 
advise... 

I've got a data frame of 250 000 rows in columns of x y and z. 

i need to extract 20-30 rows from the data frame with specific x
and y values, such that i can find the z value that corresponds. There 
is no repeated data. (its actually 250 000 squares in a 5x5m grid) 

to find them individually i can use subset successfully 

result<-subset(df,x==1.05 & y==c0.25) 

gives me the row in the dataframe with that x and y value. 

so if i have 

x = 1.05 2.85 3.40 4.25 0.25 3.05 3.70 0.20 0.30 0.70 1.05 1.20 
1.40 1.90 2.70 3.25 3.55 4.60 2.05 2.15 3.70 4.85 4.90 1.60 2.45 3.20 
3.90 4.45 

and 

y= 0.25 0.10 0.90 0.25 1.05 1.70 2.05 2.90 2.35 2.60 2.55 2.15 
2.75 2.05 2.70 2.25 2.55 2.05 3.65 3.05 3.00 3.50 3.75 4.85 4.50 4.50 
3.35 4.90 

then how can i retrieve the rows for all those values at once. 

if i name x=xt and y=yt and then 

result<-subset(df,x==xt & y==yt) 

then i get 

result 
[1] x      y      Height 
<0 rows> (or 0-length row.names) 

i dont understand why zero rows are selected. obviously im 
applying the vectors inappropriately, but i cant seem to find anything 
on this method of subsetting online. 

Thanks for any replies!  


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting multiple rows of a data frame at once

2013-07-04 Thread arun
Hi,

carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01)))
 dim(carbon.fit)
#[1] 251001  2


 xtNew<-sprintf("%.2f",xt)
 ytNew<- sprintf("%.2f",yt)
 carbon.fit[]<- lapply(carbon.fit,function(x) sprintf("%.2f",x))
res<-do.call(rbind,lapply(seq_along(xtNew),function(i) 
subset(carbon.fit,x==xtNew[i]&y==ytNew[i])))
 nrow(res)
#[1] 28
res
#  x    y
#12631  1.05 0.25
#5296   2.85 0.10
#45431  3.40 0.90
#12951  4.25 0.25
#52631  0.25 1.05
#85476  3.05 1.70
#103076 3.70 2.05
#145311 0.20 2.90
#117766 0.30 2.35
#130331 0.70 2.60
#127861 1.05 2.55
#107836 1.20 2.15
#137916 1.40 2.75
#102896 1.90 2.05
#135541 2.70 2.70
#113051 3.25 2.25
#128111 3.55 2.55
#103166 4.60 2.05
#183071 2.05 3.65
#153021 2.15 3.05
#150671 3.70 3.00
#175836 4.85 3.50
#188366 4.90 3.75
#243146 1.60 4.85
#225696 2.45 4.50
#225771 3.20 4.50
#168226 3.90 3.35
#245936 4.45 4.90
A.K.



From: Shaun ♥ Anika 
To: "smartpink...@yahoo.com"  
Sent: Thursday, July 4, 2013 12:08 AM
Subject: RE: Subsetting multiple rows of a data frame at once




Hi There,
i can give you the data needed to perform this task...

library(akima)
library(fields)

xt<- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 1.20, 
1.40, 1.90, 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 
3.20, 3.90, 4.45)

yt<- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 2.15, 
2.75, 2.05, 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 
4.50, 3.35, 4.90)

xs<- c(0.45, 1.05, 2.75, 3.30, 4.95, 0.40, 1.05, 2.30, 3.45, 4.60, 0.05, 1.95, 
2.95, 3.70, 4.55, 0.75, 1.60, 2.10, 3.60, 4.90, 0.05, 1.35, 2.60, 3.40, 4.25)

ys<- c(0.45, 0.95, 0.75, 0.95, 0.10, 1.90, 1.45, 1.25, 1.45, 1.05, 2.85, 2.60, 
2.05, 2.60, 2.55, 3.75, 3.30, 3.95, 3.45, 3.70, 4.95, 4.35, 4.55, 4.40, 4.95)

carbon<- c(1.43, 1.82, 1.40, 1.43, 1.96, 1.61, 1.91, 1.53, 1.17, 1.83, 2.43, 
2.02, 1.66, 2.45, 2.46, 1.39, 1.10, 1.38, 1.91, 2.13, 1.88, 1.26, 2.15, 1.89, 
1.69)

carbon.df=data.frame(x=xs,y=ys,z=carbon)
carbon.loess= loess(z~x*y, data= carbon.df, degree= 2)
carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01)))
z=predict(carbon.loess, newdata= carbon.fit)
carbon.fit$Height=as.numeric(z)
image.plot(seq(0,5,0.01,), seq(0,5,0.01), z, xlab = "", ylab="",main = "Carbon")

trees<-do.call(rbind,lapply(seq_along(xt),function(i) 
subset(carbon.fit,x==xt[i]&y==yt[i])))

## xt is 28 integers long and when i run the above code it only returns the 
values of 18 out of the 28 (xt,yt) pairs that i want. 

thanks for your help!!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to choose dates data?

2013-07-04 Thread arun
Hi,
You could try:
day<-as.Date(c("2008-04-12","2011-07-02","2011-09-02","2008-04-12","2008-04-12"))
 indx<-gsub("-.*","",day)
 day[indx>="2007" & indx<="2009"]
#[1] "2008-04-12" "2008-04-12" "2008-04-12"

#or
library(xts)
xt1<- xts(seq_along(day),day)
index(xt1["2007/2009"])
#[1] "2008-04-12" "2008-04-12" "2008-04-12"

#or
library(chron)
yr1<-month.day.year(unclass(day))$year
day[yr1>=2007 & yr1<=2009]
#[1] "2008-04-12" "2008-04-12" "2008-04-12"
A.K.




- Original Message -
From: Gallon Li 
To: r-help 
Cc: 
Sent: Thursday, July 4, 2013 2:31 AM
Subject: [R] how to choose dates data?

i have converted my data into date format like below:

> day=as.Date(originaldate,"%m/%d/%Y")
> day[1:5]
[1] "2008-04-12" "2011-07-02" "2011-09-02" "2008-04-12" "2008-04-12"

I wish to select only those observations from 2007 to 2009, how can I
select from this list?

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help on selecting values of an object

2013-07-04 Thread arun
Hi,
You could use:
d1<- data.frame(a,b)
k1<-data.frame(a=k)
library(plyr)
join(k1,d1,by="a")[,2]
#[1] 4 4 6 6 7 7 6
A.K.




- Original Message -
From: Andras Farkas 
To: r-help@r-project.org
Cc: 
Sent: Thursday, July 4, 2013 2:09 PM
Subject: [R] help on selecting values of an object

Dear List,

please provide some input on the following:
we have

a <-c(0,1,2,3)
b <-c(4,5,6,7)
d <-cbind(a,b)
k <-c(0,0,2,2,3,3,2)

"k" in this case consists of some values of "d[,1]" in a random sequence. What 
I am trying to do is to create an object "f" that would have the values of 
"d[,2]" in it based on "k", and again, "k" here is a vector that consists of 
some values of "d[,1]". Basically I am trying to match the values in "k" with 
their corresponding pairs in "d[,2]". So the result should look like:

f <-c(4,4,6,6,7,7,6)

appreciate your input

Andras

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting multiple rows of a data frame at once

2013-07-05 Thread arun
Hi Anika,
?merge() is a better solution.

To get the row.names intact, you could do:
carbon.fit<- within(carbon.fit,{x<-round(x,10);y<- round(y,10)}) #Using Bill's 
solution

dat1<- data.frame(x=round(xt,10),y=round(yt,10))
carbon.fit1<- 
data.frame(carbon.fit,rNames=row.names(carbon.fit),stringsAsFactors=FALSE) 
#changed here
 res1<-merge(dat1,carbon.fit1,by=c("x","y"))
 row.names(res1)<- res1[,3]
 res1<- res1[,-3]
A.K.



- Original Message -
From: William Dunlap 
To: arun ; Shaun ♥ Anika 
Cc: R help 
Sent: Thursday, July 4, 2013 8:02 PM
Subject: RE: [R] Subsetting multiple rows of a data frame at once

> xt<- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 
> 1.20, 1.40, 1.90,
> 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, 
> 4.45)
> 
> yt<- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 
> 2.15, 2.75, 2.05,
> 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, 
> 4.90)
> carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01)))
> trees<-do.call(rbind,lapply(seq_along(xt),function(i) 
> subset(carbon.fit,x==xt[i]&y==yt[i])))
> 
> ## xt is 28 integers long and when i run the above code it only returns the 
> values of 18
> out of the 28 (xt,yt) pairs that i want.

You are running into the problem that two different computational methods that 
give
the same result when applied to real numbers often give different results when 
applied
to 64-bit floating point numbers.  (In your case you expect seq(0,5,.01) to 
contain, e.g.,
the floating point number generate by parsing the string "3.05".)   Hence x==y 
is not true
when you expect it to be.  Here is where your 18 came from:
   R> table(xt %in% carbon.fit$x, yt %in% carbon.fit$y)
          
           FALSE TRUE
     FALSE     1    6
     TRUE      3   18
Round your number to the nearest 10^-10 and you get
  > table(round(xt,10) %in% round(carbon.fit$x,10), round(yt,10) %in% 
round(carbon.fit$y,10))
        
         TRUE
    TRUE   28

By the way, you may prefer using the merge() function rather than the 
do.call(rbind,lapply(...)))
business.  I think the following call to merge will do about what you want (the 
row names differ -
if they are important it is possible to get them with some minor trickery):
    merge(data.frame(x=xt,y=yt), carbon.fit)
(You still want to round your numbers as before.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of arun
> Sent: Wednesday, July 03, 2013 10:15 PM
> To: Shaun ♥ Anika
> Cc: R help
> Subject: Re: [R] Subsetting multiple rows of a data frame at once
> 
> Hi,
> 
> carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01)))
>  dim(carbon.fit)
> #[1] 251001  2
> 
> 
>  xtNew<-sprintf("%.2f",xt)
>  ytNew<- sprintf("%.2f",yt)
>  carbon.fit[]<- lapply(carbon.fit,function(x) sprintf("%.2f",x))
> res<-do.call(rbind,lapply(seq_along(xtNew),function(i)
> subset(carbon.fit,x==xtNew[i]&y==ytNew[i])))
>  nrow(res)
> #[1] 28
> res
> #  x    y
> #12631  1.05 0.25
> #5296   2.85 0.10
> #45431  3.40 0.90
> #12951  4.25 0.25
> #52631  0.25 1.05
> #85476  3.05 1.70
> #103076 3.70 2.05
> #145311 0.20 2.90
> #117766 0.30 2.35
> #130331 0.70 2.60
> #127861 1.05 2.55
> #107836 1.20 2.15
> #137916 1.40 2.75
> #102896 1.90 2.05
> #135541 2.70 2.70
> #113051 3.25 2.25
> #128111 3.55 2.55
> #103166 4.60 2.05
> #183071 2.05 3.65
> #153021 2.15 3.05
> #150671 3.70 3.00
> #175836 4.85 3.50
> #188366 4.90 3.75
> #243146 1.60 4.85
> #225696 2.45 4.50
> #225771 3.20 4.50
> #168226 3.90 3.35
> #245936 4.45 4.90
> A.K.
> 
> 
> 
> From: Shaun ♥ Anika 
> To: "smartpink...@yahoo.com" 
> Sent: Thursday, July 4, 2013 12:08 AM
> Subject: RE: Subsetting multiple rows of a data frame at once
> 
> 
> 
> 
> Hi There,
> i can give you the data needed to perform this task...
> 
> library(akima)
> library(fields)
> 
> xt<- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 
> 1.20, 1.40, 1.90,
> 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, 
> 4.45)
> 
> yt<- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 
> 2.15, 2.75, 2.05,
> 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, 
> 4.90)
> 
> xs<- c(0.45, 1.05, 2.75, 3.30, 4.95, 0.40, 1.05, 2.30, 3.45, 4.60, 0.05, 
> 1.95, 2.95, 3.70,
> 4.55, 0.75, 1.60, 2.10, 3.60, 4.90, 0.05

Re: [R] Filter Dataframe for Alarm for particular column(s).

2013-07-05 Thread arun
Hi,
May be this helps:
If you had showed your solution, it would be easier to compare.

res<-data.frame(lapply(sapply(MyDF[,c(2,4)],function(x) 
{x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1}),`[`,1))
 res
#  TNH BIX
#1   3   9


#Speed

 set.seed(24)
 MyDFNew<- 
data.frame(TNH=sample(0:1,1e6,replace=TRUE),BIX=sample(0:1,1e6,replace=TRUE))
system.time(res1<-data.frame(lapply(sapply(MyDFNew,function(x) 
{x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1}),`[`,1)))
#   user  system elapsed 
#  0.364   0.000   0.363 

 res1
#  TNH BIX
#1   7   2
 MyDFNew[1:10,]
#   TNH BIX
#1    0   1
#2    0   0
#3    1   1
#4    1   1
#5    1   0
#6    1   0
#7    0   1
#8    1   1
#9    1   1
#10   0   0


A.K.


Hi,


Hi here i have a dataframe called MyDF. 

a<-c(1,1,1,1,1,0,0,0,1,1) 
b<-c(1,1,0,1,1,0,0,0,1,1) 
c<-c(1,1,1,1,1,1,1,0,1,1) 
d<-c(1,1,1,1,1,1,1,1,0,1) 
MyDF<-data.frame(DWATT=a,TNH=b,CSGV=c,BIX=d) 

My requirement is, here i need a function - to get for a 
particular row number(s), when particular column(s) value change from 
one-to-zero  (for the first change). Suppose there is no change is 
happening then it should return "Zero" 

For example,  Using MyDF, 

DWATT TNH CSGV BIX 
1   1    1   1 
1   1    1   1 
1   0    1   1 
1   1    1   1 
1   1    1   1 
0   0    1   1 
0   0    1   1 
0   0    0   1 
1   1    1   0 
1   1    1   1 

Here i want to know, the row number where TNH-column and BIX-column values 
change happening from one-to-zero for the first time. 

Note:- Suppose there is no change is happening then it should return "Zero" 

Answer should be  a dataframe with single row. 
So here answer should return a dataframe like this. 

TNH  BIX 
    -- 
3      9 


i used some ways to get a solution using loops. But there is a bulk files with 
bulk rows to process. 
So performace is most important. Could someone please suggest better ideas ? 

Thanks, 
Antony.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Operations on a big data frame

2013-07-05 Thread arun
Hi,
May be this helps:
dat1<- read.table(text="
   P1_prom Nom
1 -6.17 Pt_00187
2 -6.17 Pt_00187
3 -6.17 Pt_00187
4 -6.17 Pt_00187
5 -6.17 Pt_00187
6 -6.17 Pt_01418
7 -5.77 Pt_01418
8 -5.37 Pt_01418
9 -4.97 Pt_01418
10 -4.57 Pt_01418
",sep="",header=TRUE,stringsAsFactors=FALSE) 

library(zoo)
 dat1$PT_promMean<-rollmean(dat1$P1_prom,5,fill=NA,align="left")
 dat1
#   P1_prom  Nom PT_promMean
#1    -6.17 Pt_00187   -6.17
#2    -6.17 Pt_00187   -6.17
#3    -6.17 Pt_00187   -6.09
#4    -6.17 Pt_00187   -5.93
#5    -6.17 Pt_00187   -5.69
#6    -6.17 Pt_01418   -5.37
#7    -5.77 Pt_01418  NA
#8    -5.37 Pt_01418  NA
#9    -4.97 Pt_01418  NA
#10   -4.57 Pt_01418  NA
A.K.


Hello all, 

I have a big data frame that looks like this: 
        P1_prom Nom 
1   -6.17   Pt_00187 
2   -6.17   Pt_00187 
3   -6.17   Pt_00187 
4   -6.17   Pt_00187 
5   -6.17   Pt_00187 
6   -6.17   Pt_01418 
7   -5.77   Pt_01418 
8   -5.37   Pt_01418 
9   -4.97   Pt_01418 
10  -4.57   Pt_01418 
- 
- 
- 
25000 

where Nom represents a point in a map, and P1_prom represents 
the value of an operation we perfomed on each point (note that we 
performed 5 repetitions for each point, hence, each point has 5 values). 
What I am trying to do, with no success, is to create a new column, 
in which each row corresponds to the mean value of P1_prom for each 
point. So basically what I need the program to do is to write in the 
first row of the new column the average of the first five values of 
P1_prom, in the second row the average of the next five values, and so 
on. 
Could anybody guide me on how to do this. 
Thank you very much, 
Veronica

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] IF function

2013-07-05 Thread arun
Hi,
May be this helps.  

dat1<- read.table(text="
Col1,Col2
High value,9
Low value,0
High value,7
Low value,0
Low value,0
No data,0
High value,8
No data,0
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat1$Col2[dat1$Col1=="No data"]<- NA
 dat1
#    Col1 Col2
#1 High value    9
#2  Low value    0
#3 High value    7
#4  Low value    0
#5  Low value    0
#6    No data   NA
#7 High value    8
#8    No data   NA

A.K.


Hello, 

I am an R novice so excuse me if this is woefully straight forward, but I have 
tried the help files to no avail. 

I am trying to identify cells in 1 column with the value of 'No data', so I can 
change the values in the next column to 'Null'. 

Currently I am struggling with the data set, as it assigns both 'No data' and 
'Low values' as zero which skews my analysis. 

I've tried a number of different attempts but just get the error unexpected 
symbol ? 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geeglm

2013-07-06 Thread arun


Hi,

Using the example from ?geeglm()
 summary(gee1)$corr
 #     Estimate Std.err
#alpha    0.957 0.00979
A.K.

- Original Message -
From: nt1006 
To: r-help@r-project.org
Cc: 
Sent: Friday, July 5, 2013 9:40 AM
Subject: [R] geeglm

How to extract the Std.err and the alpha estimated value from the geeglm
function in R.



--
View this message in context: 
http://r.789695.n4.nabble.com/geeglm-tp4670936.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subset and order

2013-07-07 Thread arun
Hi,
You could also try ?data.table()
x<- read.table(text="a    b    c
1    2    3
3    3    4
2    4    5
1    3    4
",sep="",header=TRUE)

library(data.table)

xt<- data.table(xt)
 setkey(xt,a)
 subset(xt,b==3)
#   a b c
#1: 1 3 4
#2: 3 3 4



 iord <- order(x$a)
 subset(x[iord, ], b == 3) 
#  a b c
#4 1 3 4
#2 3 3 4


Speed comparison:
set.seed(12345)
dat1<- as.data.frame(matrix(sample(1:10,3*1e7,replace=TRUE),ncol=3))
colnames(dat1)<-letters[1:3]
system.time({
iord <- order(dat1$a)
res1<-subset(dat1[iord, ], b == 3)
})
#  user  system elapsed 
#  6.888   0.296   7.202 

dt1<- data.table(dat1)
system.time({setkey(dt1,a)
    resdt1<-subset(dt1,b==3)})
# user  system elapsed 
#   0.72    0.06    0.78 

head(resdt1)
#   a b  c
#1: 1 3  6
#2: 1 3  4
#3: 1 3 10
#4: 1 3  2
#5: 1 3  9
#6: 1 3  8
 head(res1)
#    a b  c
#75  1 3  6
#93  1 3  4
#300 1 3 10
#301 1 3  2
#437 1 3  9
#672 1 3  8

A.K.
- Original Message -
From: Rui Barradas 
To: Noah Silverman 
Cc: "R-help@r-project.org" 
Sent: Friday, July 5, 2013 3:51 PM
Subject: Re: [R] Subset and order

Hello,

If time is one of the problems, precompute an ordered index, and use it 
every time you want the df sorted. But that would mean you can't do it 
in a single operation.

iord <- order(x$a)
subset(x[iord, ], b == 3)


Rui Barradas

Em 05-07-2013 20:47, Noah Silverman escreveu:
> That would work, but is painfully slow.  It forces a new sort of the data 
> with every query.  I have 200,000 rows and need almost a hundred queries.
>
> Thanks,
>
> -N
>
>
> On Jul 5, 2013, at 12:43 PM, Rui Barradas  wrote:
>
>> Hello,
>>
>> Maybe like this?
>>
>> subset(x[order(x$a), ], b == 3)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 05-07-2013 20:33, Noah Silverman escreveu:
>>> Hello,
>>>
>>> I have a data frame with several columns.
>>>
>>> I'd like to select some subset *and* order by another field at the same 
>>> time.
>>>
>>> Example:
>>>
>>> a    b    c
>>> 1    2    3
>>> 3    3    4
>>> 2    4    5
>>> 1    3    4
>>> etc…
>>>
>>>
>>> I want to select all rows where b=3 and then order by a.
>>>
>>> To subset is easy:  x[x$b==3,]
>>> To order is easy: x[order(x$a),]
>>>
>>> Is there a way to do both in a single efficient statement?
>>>
>>> Thanks,
>>>
>>>
>>>
>>> --
>>> Noah Silverman, M.S., C.Phil
>>> UCLA Department of Statistics
>>> 8117 Math Sciences Building
>>> Los Angeles, CA 90095
>>>
>>>
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need hep for converting date data in POSIXct

2013-07-07 Thread arun
Hi,
I am not sure how your dataset looks like.  If it is like the one below: 
(otherwise, please provide a reproducible example using ?dput())

dat1<- read.table(text="
datetime
10/02/2010
02:30
11/02/2010
04:00
14/02/2010
06:30
",sep="",header=TRUE,stringsAsFactors=FALSE)

lst1<-split(dat1,(seq_along(dat1$datetime)-1)%%2+1)
 dat2<- 
data.frame(datetime=as.POSIXct(paste(lst1[[1]][,1],lst1[[2]][,1]),format="%d/%m/%Y
 %H:%M"))
 str(dat2)
#'data.frame':    3 obs. of  1 variable:
# $ datetime: POSIXct, format: "2010-02-10 02:30:00" "2010-02-11 04:00:00" ...
 dat2
# datetime
#1 2010-02-10 02:30:00
#2 2010-02-11 04:00:00
#3 2010-02-14 06:30:00


#or
data.frame(datetime=as.POSIXct(paste(dat1[seq(1,nrow(dat1),by=2),1],  
dat1[seq(2,nrow(dat1),by=2),1]),format="%d/%m/%Y %H:%M"))
# datetime
#1 2010-02-10 02:30:00
#2 2010-02-11 04:00:00
#3 2010-02-14 06:30:00



A.K.



Hey everybody, 

I am a new user of R software. I don't know how I can merge two rows in 
one. In fact, I have one row with the date(dd/mm/) and another with the 
time (hh:mm) and I would like to get one row with date time in order to 
convert to POSIXct. How can I do it??

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting coordinates into two

2013-07-08 Thread arun


Hi,
vec1<- structure(c(.

vec1
#[1] -22.576608,17.07859  -24.621739,17.959728 -26.567955,18.134651
#[4] -22.832516,17.183304 -21.980459,16.91328 
#43 Levels: -17.394217,15.886574 -17.406994,14.393463 ... -28.017742,18.745594
G1<-sapply(strsplit(as.character(vec1),","),`[`,1)
 G2<-sapply(strsplit(as.character(vec1),","),`[`,2)
G1
#[1] "-22.576608" "-24.621739" "-26.567955" "-22.832516" "-21.980459"
A.K.

- Original Message -
From: Pancho Mulongeni 
To: "r-help@r-project.org" 
Cc: 
Sent: Monday, July 8, 2013 9:49 AM
Subject: [R] Splitting coordinates into two

Hi users,
I have a simple vector of five coordinates in form of ('lat1, 
long1','lat2,long2',...,'latn,longn')
And I would like to create two vectors, one just with the first coordinate
G1<-c('lat1,'lat2',..,'latn')
G2<-c('long1,'long2',...,'longn')

I am trying to apply strsplit(x=g,split=',') on my object g, but it is not 
working, any help?
I struggle to understand how to use the regular expressions.
structure(c(32L, 38L, 40L, 34L, 27L), .Label = c("-17.394217,15.886574", 
"-17.406994,14.393463", "-17.491495,14.992905", "-17.5005,24.274635", 
"-17.776151,15.765724", "-17.779911,15.699806", "-17.905569,15.977211", 
"-17.921576,19.758911", "-18.607204,17.166481", "-18.804918,17.046661", 
"-18.805731,16.940403", "-19.030476,16.467304", "-19.12441,13.616567", 
"-19.163006,15.916443", "-19.243736,17.710304", "-19.562702,18.11697", 
"-19.6303,17.342606", "-19.939787,13.013306", "-20.107201,16.154966", 
"-20.363618,14.965954", "-20.460469,16.652012", "-20.484914,17.233429", 
"-21.256102,17.869263", "-21.418555,15.949402", "-21.491128,17.853234", 
"-21.943046,17.363892", "-21.980459,16.91328", "-22.000992,15.582733", 
"-22.084367,16.750031", "-22.182318,17.072754", "-22.447841,18.962746", 
"-22.576608,17.07859", "-22.649502,14.532166", "-22.832516,17.183304", 
"-22.934365,14.521008", "-22.947328,14.508991", "-24.45,15.801086", 
"-24.621739,17.959728", "-25.460983,19.438198", "-26.567955,18.134651", 
"-26.645292,15.153944", "-27.915553,17.490921", "-28.017742,18.745594"
), class = "factor")

Pancho Mulongeni
Research Assistant
PharmAccess Foundation
1 Fouché Street
Windhoek West
Windhoek
Namibia
 
Tel:   +264 61 419 000
Fax:  +264 61 419 001/2
Mob: +264 81 4456 286

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need hep for converting date data in POSIXct

2013-07-09 Thread arun
Hi Laila,
There is only one column from the dput() output.
time1<- structure(list(date
str(time1)
#'data.frame':    20 obs. of  1 variable:
# $ date: Factor w/ 582 levels "01/01/2009 01:58",..: 370 389 390 409 410 429 
430 450 451 471 ..
 time1[,1]<-as.POSIXct(time1[,1],format="%d/%m/%Y %H:%M")
head(time1)
 #    date
#1 2008-11-20 12:23:00
#2 2008-11-21 00:33:00
#3 2008-11-21 12:29:00
#4 2008-11-22 00:29:00
#5 2008-11-22 12:39:00
#6 2008-11-23 00:50:00


A.K.






From: laila Aranda Romero 
To: arun  
Sent: Monday, July 8, 2013 4:29 PM
Subject: RE: [R] Need hep for converting date data in POSIXct




Arun,

 When I type dput(head(time,20), it appears this:

structure(list(date = structure(c(370L, 389L, 390L, 409L, 410L, 
429L, 430L, 450L, 451L, 471L, 472L, 491L, 492L, 511L, 512L, 531L, 
532L, 549L, 550L, 567L), .Label = c("01/01/2009 01:58", "01/01/2009 13:57", 
"01/02/2009 03:49", "01/02/2009 15:51", "01/03/2009 04:40", "01/03/2009 16:37", 
"01/04/2009 04:21", "01/04/2009 16:33", "01/05/2009 04:33", "01/05/2009 16:31", 
"01/06/2009 03:11", "01/06/2009 15:10", "01/07/2009 02:49", "01/07/2009 14:46", 
"01/08/2009 02:44", "01/08/2009 14:44", "01/09/2009 01:05", "01/09/2009 13:14", 
"01/12/2008 00:58", "01/12/2008 12:53", "02/01/2009 02:01", "02/01/2009 13:58", 
"02/02/2009 03:59", "02/02/2009 15:58", "02/03/2009 04:37", "02/03/2009 16:25", 
"02/04/2009 04:30", "02/04/2009 16:30", "02/05/2009 04:33", "02/05/2009 16:31", 
"02/06/2009 02:52", "02/06/2009 14:57", "02/07/2009 02:47", "02/07/2009 14:51", 
"02/08/2009 02:42", "02/08/2009 14:42", "02/09/2009 01:14", "02/09/2009 13:19", 
"03/01/2009 01:52", "03/01/2009 13:57", "03/02/2009 03:55", "03/02/2009 15:56", 
"03/03/2009 04:21", "03/03/2009 16:29", "03/04/2009 04:39", "03/04/2009 16:29", 
"03/05/2009 04:27", "03/05/2009 16:24", "03/06/2009 02:53", "03/06/2009 14:48", 
"03/07/2009 02:55", "03/07/2009 14:54", "03/08/2009 02:36", "03/08/2009 14:28", 
"03/09/2009 01:32", "03/09/2009 13:37", "04/01/2009 01:57", "04/01/2009 13:57", 
"04/02/2009 03:55", "04/02/2009 15:50", "04/03/2009 04:35", "04/03/2009 16:35", 
"04/04/2009 04:28", "04/04/2009 16:36", "04/05/2009 04:43", "04/05/2009 16:43", 
"04/06/2009 02:36", "04/06/2009 14:40", "04/07/2009 02:49", "04/07/2009 14:48", 
"04/08/2009 02:40", "04/08/2009 14:38", "04/09/2009 01:45", "04/09/2009 13:54", 
"05/01/2009 02:02", "05/01/2009 14:01", "05/02/2009 03:51", "05/02/2009 15:49", 
"05/03/2009 04:35", "05/03/2009 16:40", "05/04/2009 04:36", "05/04/2009 16:29", 
"05/05/2009 04:18", "05/05/2009 16:13", "05/06/2009 02:41", "05/06/2009 14:22", 
"05/07/2009 02:50", "05/07/2009 14:57", "05/08/2009 02:31", "05/08/2009 14:28", 
"05/09/2009 02:08", "05/09/2009 14:13", "06/01/2009 01:55", "06/01/2009 13:52", 
"06/02/2009 03:54", "06/02/2009 15:55", "06/03/2009 04:39", "06/03/2009 16:40", 
"06/04/2009 04:20", "06/04/2009 16:19", "06/05/2009 03:56", "06/05/2009 15:49", 
"06/06/2009 02:20", "06/06/2009 14:26", "06/07/2009 03:10", "06/07/2009 15:05", 
"06/08/2009 02:35", "06/08/2009 14:35", "06/09/2009 02:10", "06/09/2009 14:01", 
"06/12/2008 12:27", "07/01/2009 01:54", "07/01/2009 13:38", "07/02/2009 03:49", 
"07/02/2009 15:50", "07/03/2009 04:53", "07/03/2009 16:33", "07/04/2009 04:23", 
"07/04/2009 16:22", "07/05/2009 03:33", "07/05/2009 15:34", "07/06/2009 02:40", 
"07/06/2009 14:59", "07/07/2009 02:52", "07/07/2009 14:55", "07/08/2009 02:34", 
"07/08/2009 14:37", "07/09/2009 01:59", "07/09/2009 13:45", "07/12/2008 00:28", 
"07/12/2008 12:33", "08/01/2009 01:23", "08/01/2009 13:09", "08/02/2009 03:52", 
"08/02/2009 15:51&q

Re: [R] regular expression strikes again

2013-07-09 Thread arun
Hi,
May be this helps:

  gsub(".*\\w+\\s+(.*)\\s+.*","\\1",test)
 #[1] "9,36"  "9,36"  "9,66"  "9,66"  "9,66"  "10,04" "10,04" "10,04" "6,13" 
#[10] "6,13"  "6,13" 

A.K.

- Original Message -
From: PIKAL Petr 
To: r-help 
Cc: 
Sent: Tuesday, July 9, 2013 5:45 AM
Subject: [R] regular expression strikes again

Dear experts in regexpr.

I have this

dput(test[500:510])
c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3", 
"pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", 
"RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")

and I want something like this

gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510])
[1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13"
[11] "6,13"

but with 10,04 values instead of 0,04.

I tried
gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510])

or other variations but without any success.

Please help.

Regards
Petr

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Labelling

2013-07-09 Thread arun
Hi,
May be this helps:

 gsub("_"," ",gsub("(.*)_.*","\\1",DATA_names))
#[1] "A ugkg"  "S mgkg"  "Cl mgkg"
sapply(gsub("_"," ",gsub("(.*)_.*","\\1",DATA_names)),f)
$`A ugkg`
A ~ (mu * g ~ kg^{
    -1
})

$`S mgkg`
S ~ (mg ~ kg^{
    -1
})

$`Cl mgkg`
Cl ~ (mg ~ kg^{
    -1
})


A.K.

- Original Message -
From: Shane Carey 
To: "r-help@r-project.org" 
Cc: 
Sent: Tuesday, July 9, 2013 7:20 AM
Subject: [R] Labelling

Hi,

I have the following data as labels:

DATA_names<-c("A_ugkg_FA","S_mgkg_XRF" ,"Cl_mgkg_XR")

and I need to convert to


             -1
A (ug kg     )

             -1
S (mg kg    )

              -1
Cl (mg kg    )


I used the following piece of code to convert the following labels in the
past, but cant get it to work for the new labels:

f <- function (name)
{
  # add other suffices and their corresponding plotmath expressions to the
list
  env <- list2env(list(mgkg = bquote(mg ~ kg^{-1}),
                       ugkg = bquote(mu * g ~ kg^{-1})),
                  parent = emptyenv())
  pattern <- paste0("(", paste(objects(env), collapse="|"), ")")
  bquoteExpr <- parse(text=gsub(pattern,
                                "~(.(\\1))",
                                name))[[1]]
  # I use do.call() to work around the fact that bquote's first argument is
not evaluated.
  do.call(bquote, list(bquoteExpr, env))
}

The labels in the past were:
DATA_names<-c("A_ugkg","S_mgkg" ,"Cl_mgkg")

Thanks

-- 
Shane

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Labelling

2013-07-09 Thread arun
Hi,
Try this:
f1<- function(name)
{
env <- list2env(list(mgkg = bquote(mg ~ kg^{-1}),
  ugkg = bquote(mu * g ~ kg^{-1})),
        parent = emptyenv())
pattern <- paste0("(", paste(objects(env), collapse="|"), ")")    
bquoteExpr<-parse(text=gsub("_"," ",gsub(pattern,"~(.(\\1))~",name)))[[1]]
do.call(bquote, list(bquoteExpr, env))
}
sapply(DATA_names,f1)
$A_ugkg_FA
A ~ (mu * g ~ kg^{
    -1
}) ~ FA

$S_mgkg_XRF
S ~ (mg ~ kg^{
    -1
}) ~ XRF

$Cl_mgkg_XR
Cl ~ (mg ~ kg^{
    -1
}) ~ XR

A.K.







From: Shane Carey 
To: arun  
Cc: R help  
Sent: Tuesday, July 9, 2013 8:57 AM
Subject: Re: [R] Labelling



Initially, I wanted to remove the suffixes, but now I want to end up with the 
following 

c("A_ugkg_FA","S_mgkg_XRF" ,"Cl_mgkg_XR")


             -1
A (ug kg    ) FA


             -1
S (mg kg   ) XRF


              -1
Cl (mg kg   ) XR

Thanks all



On Tue, Jul 9, 2013 at 1:48 PM, arun  wrote:

Hi,
>May be this helps:
>
> gsub("_"," ",gsub("(.*)_.*","\\1",DATA_names))
>#[1] "A ugkg"  "S mgkg"  "Cl mgkg"
>sapply(gsub("_"," ",gsub("(.*)_.*","\\1",DATA_names)),f)
>$`A ugkg`
>A ~ (mu * g ~ kg^{
>    -1
>})
>
>$`S mgkg`
>S ~ (mg ~ kg^{
>    -1
>})
>
>$`Cl mgkg`
>Cl ~ (mg ~ kg^{
>    -1
>})
>
>
>A.K.
>
>
>- Original Message -
>From: Shane Carey 
>To: "r-help@r-project.org" 
>Cc:
>Sent: Tuesday, July 9, 2013 7:20 AM
>Subject: [R] Labelling
>
>Hi,
>
>I have the following data as labels:
>
>DATA_names<-c("A_ugkg_FA","S_mgkg_XRF" ,"Cl_mgkg_XR")
>
>and I need to convert to
>
>
>             -1
>A (ug kg     )
>
>             -1
>S (mg kg    )
>
>              -1
>Cl (mg kg    )
>
>
>I used the following piece of code to convert the following labels in the
>past, but cant get it to work for the new labels:
>
>f <- function (name)
>{
>  # add other suffices and their corresponding plotmath expressions to the
>list
>  env <- list2env(list(mgkg = bquote(mg ~ kg^{-1}),
>                       ugkg = bquote(mu * g ~ kg^{-1})),
>                  parent = emptyenv())
>  pattern <- paste0("(", paste(objects(env), collapse="|"), ")")
>  bquoteExpr <- parse(text=gsub(pattern,
>                                "~(.(\\1))",
>                                name))[[1]]
>  # I use do.call() to work around the fact that bquote's first argument is
>not evaluated.
>  do.call(bquote, list(bquoteExpr, env))
>}
>
>The labels in the past were:
>DATA_names<-c("A_ugkg","S_mgkg" ,"Cl_mgkg")
>
>Thanks
>
>--
>Shane
>
>
>    [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Shane

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kruskal.test

2013-07-09 Thread arun
Hi,
?kruskal.test()
 a<- c(2,4,5,2,7)
 b<- c(2,2,6)
 c<- c(3,7,9,3)
 kruskal.test(list(a,b,c))
#
 #  Kruskal-Wallis rank sum test
#
#data:  list(a, b, c)
#Kruskal-Wallis chi-squared = 2.003, df = 2, p-value = 0.3673
A.K.


Hi 

I need an expression in R to apply a kruskal.test to this data (for example). 
a   a   a   a    a    b   b    b    c    c   c    c 
2  4    5   2    7    2   2    6    3    7   9    3 
a, b and c could be consider different vectors. How can I apply this
 test to this data? (probably the data isn't good to this test, but I 
onlu need the expression). 

Thank you very much

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing part of delimited string with R's regex

2013-07-10 Thread arun
Hi
You could use:
 gsub("([[:alnum:]]+-)([[:alnum:]]+-)(.*)","\\1\\2zzz",name)
#[1] "hsa-miR-zzz" "hsa-miR-zzz" "hsa-let-zzz"
A.K.




- Original Message -
From: Gundala Viswanath 
To: "r-h...@stat.math.ethz.ch" 
Cc: 
Sent: Wednesday, July 10, 2013 3:02 AM
Subject: [R] Replacing part of delimited string with R's regex

I have the following list of strings:

name <- c("hsa-miR-555p","hsa-miR-519b-3p","hsa-let-7a")

What I want to do is for each of the above strings
replace the text after second delimiter with "zzz".
Yielding:

hsa-miR-zzz
hsa-miR-zzz
hsa-let-zzz

What's the way to do it?

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kruskal.test

2013-07-10 Thread arun
Hi,
Please dput() your example dataset.

dat1<- read.table(text="a   a   a   a    a    b   b    b    c    c   c    c
2  4    5   2    7    2   2    6    3    7   9    3
3  3   4   1 6    8   1    3    5    2    6    
3",sep="",header=FALSE,stringsAsFactors=FALSE)
library(reshape)
 dat2<-melt(as.data.frame(t(dat1)),id.var="V1")[,-2]
kruskal.test(value~V1,data=dat2)
#
#    Kruskal-Wallis rank sum test
#
#data:  value by V1
#Kruskal-Wallis chi-squared = 1.2888, df = 2, p-value = 0.525

#I guess you wanted for each row:
lapply(split(dat2,(seq_len(nrow(dat2))-1)%/%ncol(dat1)+1),function(x) 
kruskal.test(value~V1,data=x))
#$`1`
#
#    Kruskal-Wallis rank sum test
#
#data:  value by V1
#Kruskal-Wallis chi-squared = 2.003, df = 2, p-value = 0.3673
#

#$`2`

#    Kruskal-Wallis rank sum test

#data:  value by V1
#Kruskal-Wallis chi-squared = 0.1231, df = 2, p-value = 0.9403



A.K.

____
From: Vera Costa 
To: arun  
Sent: Wednesday, July 10, 2013 6:38 AM
Subject: Re: Kruskal.test



Thank you. 
And if I have 

a   a   a   a    a    b   b    b    c    c   c    c
2  4    5   2    7    2   2    6    3    7   9    3
3  3   4   1 6    8   1    3    5    2    6    3   ?

How can I apply the test by row?

Thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Filter Dataframe for Alarm for particular column(s).

2013-07-10 Thread arun
Hi,
You could try ?data.table() to further increase the speed:



#Same example:
dt2<- data.table(MyDFNew)
system.time(resNew<- dt2[,lapply(.SD,function(x) 
{x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1})][1] )
 # user  system elapsed 
 # 0.144   0.004   0.148 
resNew
#   TNH BIX
#1:   7   2


According to this link 
(http://stackoverflow.com/questions/9236438/how-do-i-run-apply-on-a-data-table),
 using for loop should improve the speed

Regarding the use of ts() in this case, I am not very sure.

A.K.



- Original Message -
From: R_Antony 
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 10, 2013 1:48 AM
Subject: Re: [R] Filter Dataframe for Alarm for particular column(s).

Hi Arun,



Thanks for the solution it  really works !. But how can we avoid even lappy() 
and  sappy().

Actually any way to do with ts() ?

Thanks,

Antony.



From: arun kirshna [via R] [mailto:ml-node+s789695n467097...@n4.nabble.com] 
Sent: Saturday, July 06, 2013 12:54 AM
To: Akkara, Antony (GE Power & Water, Non-GE)
Subject: Re: Filter Dataframe for Alarm for particular column(s).



Hi, 
May be this helps: 
If you had showed your solution, it would be easier to compare. 

res<-data.frame(lapply(sapply(MyDF[,c(2,4)],function(x) 
{x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1}),`[`,1)) 
res 
#  TNH BIX 
#1   3   9 


#Speed 

set.seed(24) 
MyDFNew<- 
data.frame(TNH=sample(0:1,1e6,replace=TRUE),BIX=sample(0:1,1e6,replace=TRUE)) 
system.time(res1<-data.frame(lapply(sapply(MyDFNew,function(x) 
{x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1}),`[`,1))) 
#   user  system elapsed 
#  0.364   0.000   0.363 

res1 
#  TNH BIX 
#1   7   2 
MyDFNew[1:10,] 
#   TNH BIX 
#1    0   1 
#2    0   0 
#3    1   1 
#4    1   1 
#5    1   0 
#6    1   0 
#7    0   1 
#8    1   1 
#9    1   1 
#10   0   0 


A.K. 


Hi, 


Hi here i have a dataframe called MyDF. 

a<-c(1,1,1,1,1,0,0,0,1,1) 
b<-c(1,1,0,1,1,0,0,0,1,1) 
c<-c(1,1,1,1,1,1,1,0,1,1) 
d<-c(1,1,1,1,1,1,1,1,0,1) 
MyDF<-data.frame(DWATT=a,TNH=b,CSGV=c,BIX=d) 

My requirement is, here i need a function - to get for a 
particular row number(s), when particular column(s) value change from 
one-to-zero  (for the first change). Suppose there is no change is 
happening then it should return "Zero" 

For example,  Using MyDF, 

DWATT TNH CSGV BIX 
1   1    1   1 
1   1    1   1 
1   0    1   1 
1   1    1   1 
1   1    1   1 
0   0    1   1 
0   0    1   1 
0   0    0   1 
1   1    1   0 
1   1    1   1 

Here i want to know, the row number where TNH-column and BIX-column values 
change happening from one-to-zero for the first time. 

Note:- Suppose there is no change is happening then it should return "Zero" 

Answer should be  a dataframe with single row. 
So here answer should return a dataframe like this. 

TNH  BIX 
    -- 
3      9 


i used some ways to get a solution using loops. But there is a bulk files with 
bulk rows to process. 
So performace is most important. Could someone please suggest better ideas ? 

Thanks, 
Antony. 

__ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 





If you reply to this email, your message will be added to the discussion below:

http://r.789695.n4.nabble.com/Filter-Dataframe-for-Alarm-for-particular-column-s-tp4670950p4670970.html
 

To unsubscribe from Filter Dataframe for Alarm for particular column(s)., click 
here 
<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4670950&code=YW50b255LmFra2FyYUBnZS5jb218NDY3MDk1MHwxNTUxOTQzMDI5>
 .
NAML 
<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
  





--
View this message in context: 
http://r.789695.n4.nabble.com/Filter-Dataframe-for-Alarm-for-particular-column-s-tp4670950p4671203.html
Sent from the R help mailing list archive at Nabble.com.
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create new matrix from user-defined function

2013-07-10 Thread arun
Hi,
You could try:
 
mat1<-matrix(dat3[rowSums(dat3[,2:3])!=dat3[,4],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS"))
 mat1
# MW_EEsDue_ERRORS
#[1,] 1882
#[2,] 1884
#[3,] 1885
A.K.


#Let's say I have the following data set: 

dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885), 
                  B_MW_EEsDue1 = c(2, 2, 1, 4, 6), 
                  C_MW_EEsDue2 = c(5, 5, 4, 1, 6), 
                  D_MW_EEsDueTotal = c(7, 9, 5, 6, 112)) 
dat3 
# A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal 
# 1     1881            2            5                7 
# 2     1882            2            5                9 
# 3     1883            1            4                5 
# 4     1884            4            1                6 
# 5     1885            6            6              112 

# I want to: 
#CREATE A NEW 1-COLUMN MATRIX (of unknown #rows) LISTING ONLY "A"'s WHERE "D != 
B + C" 
#THIS COLUMN CAN BE LABELED "MW_EEsDue_ERRORS", and output for this example 
should be: 

# MW_EEsDue_ERRORS 
# 1 1882 
# 2 1884 
# 3 1885 

#What is the best way to do this?  Thanks for your time.  BNC

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need hep for converting date data in POSIXct

2013-07-10 Thread arun


Hi,
I guess the error message:
> vmask(lat,lon,time,vmax=25)
Error en vmask(lat, lon,
time, vmax = 25) : objeto 'lat' no encontrado

says that you have not defined the object 'lat'.

time<-subset(Geo, select =date)
time[,1]<-  as.POSIXct(time[,1],format="%d/%m/%Y %H:%M")
location<- subset(Geo,select=c(lat.comp,long))
 time1<- time[,1]
 lat<- location[,1]
 long<- location[,2]
library(argosfilter)
 vmask(lat,long,time1,25)
#[1] "end_location" "end_location" "not"  "not"  "end_location"
#[6] "end_location"

A.K.

From: laila Aranda Romero 
To: arun  
Sent: Wednesday, July 10, 2013 6:21 PM
Subject: RE: [R] Need hep for converting date data in POSIXct





Hi,

The code: 

library(argosfilter)
setwd("C:/Users/Usuario/Dropbox/Laila Aranda/PUFGRA")
Geo = 
read.table("2370001_PUFGRA_2009_Gough_000_retarded10_both.trj",header=FALSE,sep
= ",", col.names= c("type", "date",
"secs", "Trans1",  "Trans2",
"lat.sta",  "lat.comp", "long", 
"dist", "rumbo", "velocidad", 
"confianza"))
View(Geo)
location=subset(Geo, select= c(lat.comp,long))
time=subset(Geo, select =c(date))
time[,1]<-as.POSIXct(time[,1],format="%d/%m/%Y
%H:%M")   
vmask(lat,lon,time,vmax=25)




The example: library(argosfilter)
> setwd("C:/Users/Usuario/Dropbox/LailaAranda/PUFGRA")
> Geo = 
> read.table("2370001_PUFGRA_2009_Gough_000_retarded10_both.trj",header=FALSE,sep
= ",", col.names= c("type", "date","secs", "Trans1", "Trans2", "lat.sta", 
"lat.comp", "long", "dist", "rumbo", "velocidad",  "confianza"))
> str(Geo)

'data.frame':  582
obs. of  12 variables: $
type : Factor w/ 2 levels
"midnight","noon": 2 1 2 1 2 1 2 1 2 1 ...
 $
date : Factor w/ 582 levels
"01/01/2009 01:58",..: 370 389 390 409 410 429 430 450 451 471 ...

 $
secs : num  39773 39773 39774 39774 39775 ... $
Trans1   : Factor w/ 186 levels
"04:06","04:08",..: 14 17 17 16 16 28 28 19 19 15 ...
 $
Trans2   : Factor w/ 159 levels
"00:01","00:03",..: 30 30 28 28 34 34 35 35 36 36 ...
 $
lat.sta  : num  -42.7 -39.1 -37.8 -37.9 -41.2 ...
 $
lat.comp : num  -42.7 -40.6 -38.6 -37.9
-39 ...

 $
long : num  9.31 11.66 10.88 10.72 13.06 ...
 $ dist : num 
0 0 127 45 131 ...
 $ rumbo   
: num  0 0 -16.49 -9.64 -57.22 ...
 $ velocidad: num  0 0 10.64 3.75 10.75 ... $ confianza: int  3 9 9 9 9 6 6 9 9 
9
...
> head(Geo)
type date secs Trans1 Trans2 lat.sta lat.comp  long  
dist
1 noon 20/11/2008 12:23 39772.52 
04:59  19:47  -42.72  
-42.72  9.31   0.00
2 midnight 21/11/2008 00:33 39773.02  05:18 
19:47  -39.14   -40.63 11.66   0.00
3 noon 21/11/2008 12:29 39773.52 
05:18  19:41  -37.82  
-38.60 10.88 127.02
4 midnight 22/11/2008 00:29 39774.02  05:17 
19:41  -37.86   -37.86 10.72 
45.04
5 noon 22/11/2008 12:39 39774.53 
05:17  20:00  -41.21  
-39.04 13.06 130.78
6 midnight 23/11/2008 00:50 39775.03  05:41 
20:00  -36.56   -38.51 16.02 142.06
   rumbo
velocidad confianza
1  
0.00  0.00 3
2  
0.00  0.00 9
3 -16.49
10.64 9
4 
-9.64  3.75     9
5 -57.22
10.75 9
6 
77.07 11.66 6
> location=subset(Geo, select=
c(lat.comp,long))

> str(location)
'data.frame':  582
obs. of  2 variables:
 $lat.comp: num  -42.7 -40.6 -38.6 -37.9 -39 ...
 $long    : num  9.31 11.66 10.88 10.72 13.06 ...
> head(location)

lat.comp  long
1  
-42.72  9.31
2  
-40.63 11.66
3  
-38.60 10.88
4  
-37.86 10.72
5  
-39.04 13.06
6  
-38.51 16.02

> time=subset(Geo, select =c(date))
> time[,1]<-as.POSIXct(time[,1],format="%d/%m/%Y
%H:%M")
> str(time)
'data.frame':  582
obs. of  1 variable:
 $ date:
POSIXct, format: "2008-11-20 12:23:00" "2008-11-21
00:33:00" ...
> head(time)
 date
1 2008-11-20 12:23:00
2 2008-11-21 00:33:00
3 2008-11-21 12:29:00
4 2008-11-22 00:29:00
5 2008-11-22 12:39:00
6 2008-11-23 00:50:00
> vmask(lat,lon,time,vmax=25)
Error en vmask(lat, lon,
time, vmax = 25) : objeto 'lat' no encontrado

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculate time from dates

2013-07-11 Thread arun
Hi,
May be this helps:


dat1<- read.table(text="
ID date
1 4/12/2008
1 4/13/2008
1 5/11/2008
2 3/21/2009
2 4/22/2009
2 8/05/2009
",sep="",header=TRUE,stringsAsFactors=FALSE)
library(mondate)
M1<- mondate(dat1[,2])
M2<- mondate("01/01/2008")
dat1$month<-as.numeric(abs(floor(MonthsBetween(M1,M2
 dat1
#  ID  date month
#1  1 4/12/2008 4
#2  1 4/13/2008 4
#3  1 5/11/2008 5
#4  2 3/21/2009    15
#5  2 4/22/2009    16
#6  2 8/05/2009    20
A.K.



- Original Message -
From: Gallon Li 
To: r-help 
Cc: 
Sent: Thursday, July 11, 2013 5:56 AM
Subject: [R] calculate time from dates

My data are from 2008 to 2010, with repeated measures for same subjects. I
wish to compute number of months since january 2008.

The data are like the following:

ID date
1 4/12/2008
1 4/13/2008
1 5/11/2008
2 3/21/2009
2 4/22/2009
2 8/05/2009
...

the date column are in the format "%m/%d/%y". i wish to obtain

ID month
1 4
1 4
1 5
2 15
2 16
2 20
...

also, for the same ID with two identical month, I only want to keep the
last one. can any expert help with this question?

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read a txt file as numeric

2013-07-11 Thread arun
Hi,
May be this helps:
dat1<- read.table(text="
142,QUANTIZE_CAL_MIN_BAND_10,1
143,QUANTIZE_CAL_MAX_BAND_11,65535
144,QUANTIZE_CAL_MIN_BAND_11,1
145,END_GROUP,MIN_MAX_PIXEL_VALUE
146,GROUP,RADIOMETRIC_RESCALING
147,RADIANCE_MULT_BAND_1,1.2483E-02
148,RADIANCE_MULT_BAND_2,1.2730E-02
",sep=",",header=FALSE,stringsAsFactors=FALSE,row.names=1)

#Assuming that 142, 143, etc are row.names.
#You could create a new column with just the numeric values leaving the strings 
in the 2nd column.
dat1$NewCol<-as.numeric(ifelse(grepl("\\d+",dat1[,2]),dat1[,2],NA))
dat1[,2][grepl("\\d+",dat1[,2])]<-NA
dat1
#  V2    V3 NewCol
#142 QUANTIZE_CAL_MIN_BAND_10   1.e+00
#143 QUANTIZE_CAL_MAX_BAND_11   6.5535e+04
#144 QUANTIZE_CAL_MIN_BAND_11   1.e+00
#145    END_GROUP   MIN_MAX_PIXEL_VALUE NA
#146    GROUP RADIOMETRIC_RESCALING NA
#147 RADIANCE_MULT_BAND_1   1.2483e-02
#148 RADIANCE_MULT_BAND_2   1.2730e-02
 str(dat1)
#'data.frame':    7 obs. of  3 variables:
# $ V2    : chr  "QUANTIZE_CAL_MIN_BAND_10" "QUANTIZE_CAL_MAX_BAND_11" 
"QUANTIZE_CAL_MIN_BAND_11" "END_GROUP" ...
# $ V3    : chr  NA NA NA "MIN_MAX_PIXEL_VALUE" ...
# $ NewCol: num  1 65535 1 NA NA ...

A.K.


Hello, 

I am relatively new to the R community. 
I have a .txt file containing the metafile with informations 
regarding landsat calibration parameters. This contains 2 columns: one 
with the description of the parameter and the other one with the value 
of the parameter. The problem is that the column with the values 
contains also words in some cases, which I believe makes the 
read.table() read the column not as a numeric value. 
This is an example of how it looks like: 

142           QUANTIZE_CAL_MIN_BAND_10                                          
    1 142 
143           QUANTIZE_CAL_MAX_BAND_11                                          
65535 143 
144           QUANTIZE_CAL_MIN_BAND_11                                          
    1 144 
145                          END_GROUP                            
MIN_MAX_PIXEL_VALUE 145 
146                              GROUP                          
RADIOMETRIC_RESCALING 146 
147               RADIANCE_MULT_BAND_1                                     
1.2483E-02 147 
148               RADIANCE_MULT_BAND_2                                     
1.2730E-02 148 
149               RADIANCE_MULT_BAND_3                                     
1.1656E-02 149 

I need the left column to be read as numeric, does anyone have some good 
suggestion on how to approach this problem? 

Thank you in advance. 

Stefano

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a list of filenames from a csv file

2013-07-11 Thread arun
Hi,
Try this:
files1<-read.csv("files.csv",header=TRUE,stringsAsFactors=FALSE)
 str(files1)
#'data.frame':    2 obs. of  2 variables:
# $ Col1: chr  "ANA110915004A_3PERIOD_TmAvg-rdata.csv" 
"ANA110915006A_3PERIOD_TmAvg-rdata.csv"
# $ Col2: chr  "Pre-DA" "DA-10^-6"
files1
#   Col1 Col2
#1 ANA110915004A_3PERIOD_TmAvg-rdata.csv   Pre-DA
#2 ANA110915006A_3PERIOD_TmAvg-rdata.csv DA-10^-6

#Using some fake data

lapply(seq_len(nrow(files1)),function(i) 
{x1<-read.csv(file=files1[i,1],header=TRUE,sep="",check.names=FALSE);x1[files1[i,2]]})
[[1]]
#  Pre-DA
#1  2
#2  3
#3  6
#4  4

#[[2]]
 # DA-10^-6
#1    9
#2   14
#3   13
#4   21


Hope this helps.
A.K.



- Original Message -
From: Jannetta Steyn 
To: r-help 
Cc: 
Sent: Thursday, July 11, 2013 9:01 AM
Subject: [R] Reading a list of filenames from a csv file

What would be the best way to read a list of filenames and headings from a
csv file?

The CSV file is structured as two columns, with column one being the
filename and column two being a heading e.g.:
ANA110915004A_3PERIOD_TmAvg-rdata.csv,Pre-DA
ANA110915006A_3PERIOD_TmAvg-rdata.csv,DA-10^-6
ANA110915012A_3PERIOD_TmAvg-rdata.csv,DA-10^-4
ANA110915016A_3PERIOD_TmAvg-rdata.csv,Washout


I want to be able to open the file using read.csv and use the heading as
the header of a graph.

Reading the filenames from the directory with list.files() works but then I
don't have the headings that go with the file e.g.:
filenames<-list.files(pattern="*.csv")
for (i in seq_along(filenames)) {
  con<-read.csv(filenames[i], headers=TRUE, sep=',')
}

I tried the code below (which I posted in a different thread) but the
solutions that people offered me didn't get it to work. The code results in
'Error in read.table(file = file, header = header, sep = sep, quote =
quote,  :
  'file' must be a character string or connection

# Read filenames from csv file
files <- read.csv(file="files.csv",head=FALSE,sep=",")

# for each filename read the file
for (i in 1:length(files)) {
  # f becomes the next row inthe file
  f<-files[i,]
  # the header to be used for the graph is in column 2 of f
  head=f[2]
  par(mfrow=c(4,2))
  # the filename to be used is in column 1 of f
  con<-read.csv(file=f[1], header=TRUE, sep=',')
  tmp<-con$value2
  data<-normalize_js(tmp,-1,1)
  time<-con$time
  # run the waveform analyser
  waveformanalyser(data,time,head)
}

Regards
Jannetta

-- 

===
Web site: http://www.jannetta.com
Email: janne...@henning.org
===

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LDA and confidence ellipse

2013-07-11 Thread arun
Hi,
May be this helps:
require(MASS)
require(ggplot2)
iris.lda<-lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length + 
Petal.Width,  data = iris) 
datPred<-data.frame(Species=predict(iris.lda)$class,predict(iris.lda)$x)

library(ellipse)
dat_ell <- data.frame()
for(g in levels(datPred$Species)){
dat_ell <- rbind(dat_ell, 
cbind(as.data.frame(with(datPred[datPred$Species==g,], ellipse(cor(LD1, LD2), 
 scale=c(sd(LD1),sd(LD2)), 
 
centre=c(mean(LD1),mean(LD2),Species=g))
}

ggplot(datPred, aes(x=LD1, y=LD2, col=Species) ) + geom_point( size = 4, 
aes(color = 
Species))+theme_bw()+geom_path(data=dat_ell,aes(x=x,y=y,color=Species),size=1,linetype=2)
  


A.K.


Hi, 

I wish to add confidence ellipse on my LDA result of the iris data set. 
Therefore: 
Is there statistical logic to do that as I only wish it to make the species 
separation more visable? 
How can I add it to the script below  (ggplot): 
require(MASS) 
require(ggplot2) 
iris.lda<-lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length + 
Petal.Width,  data = iris) 
LD1<-predict(iris.lda)$x[,1] 
LD2<-predict(iris.lda)$x[,2] 
ggplot(iris, aes(x=LD1, y=LD2, col=iris$Species) ) + geom_point( size = 4, 
aes(color = iris$Species))+theme_bw()   

Could someone please help me. Thank you very much.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LDA and confidence ellipse

2013-07-11 Thread arun
Hi,
No problem.
The default should be 0.95
?ellipse()
level: The confidence level of a pairwise confidence region.  The
  default is 0.95, for a 95% region.  This is used to control
  the size of the ellipse being plotted.  A vector of levels
  may be used.

A.K.




- Original Message -
From: Lluis 
To: r-help@r-project.org
Cc: 
Sent: Thursday, July 11, 2013 3:15 PM
Subject: Re: [R] LDA and confidence ellipse

Hi,

Thanks works like magic.

BTW
What is the confidence ellipses probability used?



--
View this message in context: 
http://r.789695.n4.nabble.com/LDA-and-confidence-ellipse-tp4671308p4671357.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create new matrix from user-defined function

2013-07-11 Thread arun
Hi,
Not sure I understand you correctly.
I found it easier to index using number than replace it by lengthy column names.
You could do it similar to the one below.

matNew<-matrix(dat3[rowSums(dat3[c("B_MW_EEsDue1","C_MW_EEsDue2")])!=dat3["D_MW_EEsDueTotal"],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS"))

 matNew
# MW_EEsDue_ERRORS
#[1,] 1882
#[2,] 1884
#[3,] 1885

If you have very large dataset, you could also check ?data.table().


library(data.table)
dt3<- data.table(dat3)
dtNew<-subset(dt3[D_MW_EEsDueTotal!=B_MW_EEsDue1+C_MW_EEsDue2],select=1)
 dtNew
#   A_CaseID
#1: 1882
#2: 1884
#3: 1885


#Some speed comparisons:
set.seed(1254)
datTest<- data.frame(A=sample(1000:15000,1e7,replace=TRUE),B= 
sample(1:10,1e7,replace=TRUE),C=sample(5:15,1e7,replace=TRUE),D=sample(5:25,1e7,replace=TRUE))

system.time(res1<- data.frame(MW_EEsDue_ERRORS=datTest[datTest[[4]] != 
datTest[[2]]+datTest[[3]],][[1]]))
# user  system elapsed 
#  2.256   0.000   2.145 

system.time(mat1<-matrix(datTest[rowSums(datTest[,2:3])!=datTest[,4],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS")))
 #  user  system elapsed 
 # 0.756   0.088   0.849 

system.time(res2<- 
data.frame(MW_EEsDue_ERRORS=datTest[addmargins(as.matrix(datTest[,2:3]),2)[,3]!=datTest[,4],1]))
#   user  system elapsed 
#115.740   0.000 105.778 

dtTest<- data.table(datTest)
system.time(res3<- subset(dtTest[D!=B+C],select=1))
 # user  system elapsed 
 # 0.508   0.000   0.477 

identical(res1,res2)
#[1] TRUE
setnames(res3,"A","MW_EEsDue_ERRORS")
 identical(res1,as.data.frame(res3))
#[1] TRUE
A.K.




- Original Message -
From: bcrombie 
To: r-help@r-project.org
Cc: 
Sent: Thursday, July 11, 2013 3:54 PM
Subject: Re: [R] create new matrix from user-defined function

Dan and Arun, thank you very much for your replies.  They are both very helpful 
and I love to get different versions of an answer so I can learn more R code.  
You both used indexing to refer to the columns needed in the function, but 
since my real data frame will be much larger I'm assuming I can replace the 
index numbers with the names of the columns in quotes instead?   I'll try this 
on my own if you're busy with other forum questions.  Thanks, again.

From: Nordlund, Dan (DSHS/RDA) [via R] 
[mailto:ml-node+s789695n4671267...@n4.nabble.com]
Sent: Wednesday, July 10, 2013 5:46 PM
To: Crombie, Burnette N
Subject: Re: create new matrix from user-defined function

> -Original Message-
> From: [hidden email] 
> [mailto:r-help-bounces@r-
> project.org<mailto:r-help-bounces@r-%20%0b%3e%20project.org>] On Behalf Of 
> bcrombie
> Sent: Wednesday, July 10, 2013 12:19 PM
> To: [hidden email]
> Subject: [R] create new matrix from user-defined function
>
> #Let's say I have the following data set:
>
> dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885),
>                   B_MW_EEsDue1 = c(2, 2, 1, 4, 6),
>                   C_MW_EEsDue2 = c(5, 5, 4, 1, 6),
>                   D_MW_EEsDueTotal = c(7, 9, 5, 6, 112))
> dat3
> # A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal
> # 1     1881            2            5                7
> # 2     1882            2            5                9
> # 3     1883            1            4                5
> # 4     1884            4            1                6
> # 5     1885            6            6              112
>
> # I want to:
> #CREATE A NEW 1-COLUMN MATRIX (of unknown #rows) LISTING ONLY "A"'s
> WHERE "D
> != B + C"
> #THIS COLUMN CAN BE LABELED "MW_EEsDue_ERRORS", and output for this
> example
> should be:
>
> # MW_EEsDue_ERRORS
> # 1 1882
> # 2 1884
> # 3 1885
>
> #What is the best way to do this?  Thanks for your time.  BNC
>
>

Here is one option, there are many others.  Only you can decide what is "best".

data.frame(MW_EEsDue_ERRORS=dat3[dat3[[4]] != dat3[[2]]+dat3[[3]],][[1]])


Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204

__
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/create-new-matrix-from-user-defined-function-tp4671250p4671267.html
To unsubscribe from create new matrix from user-defined function, click 
here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=u

Re: [R] create new matrix from user-defined function

2013-07-11 Thread arun
Hi BNC,
No problem.
You could also use ?with() 

data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[D_MW_EEsDueTotal!=rowSums(cbind(B_MW_EEsDue1,C_MW_EEsDue2))]))
#  MW_EEsDue_ERRORS
#1 1882
#2 1884
#3 1885
A.K.



- Original Message -
From: "Crombie, Burnette N" 
To: arun 
Cc: R help 
Sent: Thursday, July 11, 2013 4:40 PM
Subject: RE: [R] create new matrix from user-defined function

You understood me perfectly, and I agree is it easier to index using numbers 
than names.  I'm just afraid if my dataset gets too big I'll mess up which 
index numbers I'm supposed to be using.  "data.table()" looks very useful and a 
good way to approach the issue.  Thanks.  I really appreciate your (everyone's) 
help.  BNC

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Thursday, July 11, 2013 4:29 PM
To: Crombie, Burnette N
Cc: R help
Subject: Re: [R] create new matrix from user-defined function

Hi,
Not sure I understand you correctly.
I found it easier to index using number than replace it by lengthy column names.
You could do it similar to the one below.

matNew<-matrix(dat3[rowSums(dat3[c("B_MW_EEsDue1","C_MW_EEsDue2")])!=dat3["D_MW_EEsDueTotal"],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS"))

 matNew
# MW_EEsDue_ERRORS
#[1,] 1882
#[2,] 1884
#[3,] 1885

If you have very large dataset, you could also check ?data.table().


library(data.table)
dt3<- data.table(dat3)
dtNew<-subset(dt3[D_MW_EEsDueTotal!=B_MW_EEsDue1+C_MW_EEsDue2],select=1)
 dtNew
#   A_CaseID
#1: 1882
#2: 1884
#3: 1885


#Some speed comparisons:
set.seed(1254)
datTest<- data.frame(A=sample(1000:15000,1e7,replace=TRUE),B= 
sample(1:10,1e7,replace=TRUE),C=sample(5:15,1e7,replace=TRUE),D=sample(5:25,1e7,replace=TRUE))

system.time(res1<- data.frame(MW_EEsDue_ERRORS=datTest[datTest[[4]] != 
datTest[[2]]+datTest[[3]],][[1]]))
# user  system elapsed
#  2.256   0.000   2.145 

system.time(mat1<-matrix(datTest[rowSums(datTest[,2:3])!=datTest[,4],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS")))
 #  user  system elapsed
 # 0.756   0.088   0.849 

system.time(res2<- 
data.frame(MW_EEsDue_ERRORS=datTest[addmargins(as.matrix(datTest[,2:3]),2)[,3]!=datTest[,4],1]))
#   user  system elapsed
#115.740   0.000 105.778 

dtTest<- data.table(datTest)
system.time(res3<- subset(dtTest[D!=B+C],select=1))
 # user  system elapsed
 # 0.508   0.000   0.477 

identical(res1,res2)
#[1] TRUE
setnames(res3,"A","MW_EEsDue_ERRORS")
 identical(res1,as.data.frame(res3))
#[1] TRUE
A.K.




- Original Message -
From: bcrombie 
To: r-help@r-project.org
Cc: 
Sent: Thursday, July 11, 2013 3:54 PM
Subject: Re: [R] create new matrix from user-defined function

Dan and Arun, thank you very much for your replies.  They are both very helpful 
and I love to get different versions of an answer so I can learn more R code.  
You both used indexing to refer to the columns needed in the function, but 
since my real data frame will be much larger I'm assuming I can replace the 
index numbers with the names of the columns in quotes instead?   I'll try this 
on my own if you're busy with other forum questions.  Thanks, again.

From: Nordlund, Dan (DSHS/RDA) [via R] 
[mailto:ml-node+s789695n4671267...@n4.nabble.com]
Sent: Wednesday, July 10, 2013 5:46 PM
To: Crombie, Burnette N
Subject: Re: create new matrix from user-defined function

> -Original Message-
> From: [hidden email] 
> [mailto:r-help-bounces@r- 
> project.org<mailto:r-help-bounces@r-%20%0b%3e%20project.org>] On 
> Behalf Of bcrombie
> Sent: Wednesday, July 10, 2013 12:19 PM
> To: [hidden email]
> Subject: [R] create new matrix from user-defined function
>
> #Let's say I have the following data set:
>
> dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885),
>                   B_MW_EEsDue1 = c(2, 2, 1, 4, 6),
>                   C_MW_EEsDue2 = c(5, 5, 4, 1, 6),
>                   D_MW_EEsDueTotal = c(7, 9, 5, 6, 112))
> dat3
> # A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal  # 1     1881           
>  
>2            5                7  # 2     1882            2            5        
>        
>9  # 3     1883            1            4                5  # 4     
>1884            4            1                6  # 5     1885            
>6            6              112
>
> # I want to:
> #CREATE A NEW 1-COLUMN MATRIX (of unknown #rows) LISTING ONLY "A"'s 
> WHERE "D != B + C"
> #THIS COLUMN CAN BE LABELED "MW_EEsDue_ERRORS", and output for this 
> example should be:
>
> # MW_EEsDue_ERRORS
> # 1 1882
> # 2 1884
> # 3 1885
>
> #What is the best way to do this?  Thanks for your time.  BNC
>

Re: [R] Help with IF command strings

2013-07-11 Thread arun
HI,
Try this:
set.seed(485)
dat1<- as.data.frame(matrix(sample(0:10,26*10,replace=TRUE),ncol=26))
mean(dat1$V21[dat1$V2==1|dat1$V2==0])
#[1] 3.8
#or
with(dat1,mean(V21[V2==1|V2==0]))
#[1] 3.8


A.K.


I have data in 26 columns, I'm trying to get a mean for column 21 only for the 
participants that are either 0 or 1 in column 2. 

One of the commands I tried looked something like this 

mean(data1$V21, if(V2 = 1))   

So basically I need to have the program run a mean (and later 
other forms of analysis) on participants based on their condition. 
either 0 or 1. 

Help is greatly appreciated. 

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with IF command strings

2013-07-12 Thread arun
Hi,

Not sure I understand your question.
Suppose `data1` is your real data, but if the column names are different, 
change "V21", "V2" by those in the real data. Based on your initial post, the 
column names seemed to be the same.
mean(data1$V21[data1$V2==1|data1$V2==0])

A.K.  


What values would I substitute by real data.  I did everything the way 
you posted, and I got 3.8 as well.  So I'm curious what values I would 
change to get the mean for the actual data? 


- Original Message -
From: arun 
To: R help 
Cc: 
Sent: Thursday, July 11, 2013 9:21 PM
Subject: Re: Help with IF command strings

HI,
Try this:
set.seed(485)
dat1<- as.data.frame(matrix(sample(0:10,26*10,replace=TRUE),ncol=26))
mean(dat1$V21[dat1$V2==1|dat1$V2==0])
#[1] 3.8
#or
with(dat1,mean(V21[V2==1|V2==0]))
#[1] 3.8


A.K.


I have data in 26 columns, I'm trying to get a mean for column 21 only for the 
participants that are either 0 or 1 in column 2. 

One of the commands I tried looked something like this 

mean(data1$V21, if(V2 = 1))   

So basically I need to have the program run a mean (and later 
other forms of analysis) on participants based on their condition. 
either 0 or 1. 

Help is greatly appreciated. 

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replicating Rows

2013-07-12 Thread arun
Hi,

apple<- read.table(text="
Fam.name,Item,AMT.SALE.NET.PROMO,X.CY..QTY.SALE.TOT
9475,Imported Fruits,22110276001,0,436
9499,Imported Fruits,22110277001,0,236
9523,Imported Fruits,22110278001,0,71 
",sep=",",header=TRUE,stringsAsFactors=FALSE)
str(apple)
#'data.frame':    3 obs. of  4 variables:
# $ Fam.name  : chr  "Imported Fruits" "Imported Fruits" "Imported 
Fruits"
# $ Item  : num  2.21e+10 2.21e+10 2.21e+10
# $ AMT.SALE.NET.PROMO: int  0 0 0
# $ X.CY..QTY.SALE.TOT: num  436 236 71

Here, it changed the class of some of the variables.
new<-sapply(apple[,-4],rep,apple[,4]) 
str(as.data.frame(new,stringsAsFactors=FALSE))
#'data.frame':    743 obs. of  3 variables:
# $ Fam.name  : chr  "Imported Fruits" "Imported Fruits" "Imported 
Fruits" "Imported Fruits" ...
# $ Item  : chr  "22110276001" "22110276001" "22110276001" 
"22110276001" ...
# $ AMT.SALE.NET.PROMO: chr  "0" "0" "0" "0" ...



new1<-apple[rep(seq_len(nrow(apple)),apple[,4]),-4]
 row.names(new1)<- 1:nrow(new1)
 str(new1)
#'data.frame':    743 obs. of  3 variables:
# $ Fam.name  : chr  "Imported Fruits" "Imported Fruits" "Imported 
Fruits" "Imported Fruits" ...
# $ Item  : num  2.21e+10 2.21e+10 2.21e+10 2.21e+10 2.21e+10 ...
# $ AMT.SALE.NET.PROMO: int  0 0 0 0 0 0 0 0 0 0 ..
A.K.




I try to replicate the rows according to the number of quantity 
occurred. Its row should be be sum of the quantity. is there any wrong 
with my code? thanks. 

apple 
            Fam.name        Item AMT.SALE.NET.PROMO X.CY..QTY.SALE.TOT 
9475 Imported Fruits 22110276001                  0                436 
9499 Imported Fruits 22110277001                  0                236 
9523 Imported Fruits 22110278001                  0                 71 
9552 Imported Fruits 22110306001                  0                 69 
9571 Imported Fruits 22110314001                  0                 20 
9579 Imported Fruits 22110315001                  0                 80 
9604 Imported Fruits 22110317001                  0                 61 
9635 Imported Fruits 22110321001                  0               1026 
9697 Imported Fruits 22110334001                  0                223 
9720 Imported Fruits 22110335001                  0                214 
9744 Imported Fruits 22110336001                  0                102 
9768 Imported Fruits 22110337001                  0                146 
9868 Imported Fruits 22110354001              118.8                 17 
9893 Imported Fruits 22110360001                  0                 43 
9904 Imported Fruits 22110363001                  0                 49 
9920 Imported Fruits 22110364001                  0                  1 
9938 Imported Fruits 22110365001              205.4                 33 

new<-sapply(apple[,-4],rep,apple[,4]) 
nrow(new) 
[1] 33572

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Needing help for excluding vector elements

2013-07-12 Thread arun
Hi,
Try:
set.seed(41)
vec1<- sample(1:50,12000,replace=TRUE)
tail(vec1,-1000)
length(tail(vec1,-1000))
#[1] 11000


A.K.




- Original Message -
From: Olivier Charansonney 
To: r-help@r-project.org
Cc: 
Sent: Friday, July 12, 2013 6:06 AM
Subject: [R] Needing help for excluding vector elements

Hello,

R for Dummies.

How can I exclude the first 1000 values of a vector (length 12000)? More
generally all the values up to the ith?

Thanks for your help,



Dr Olivier Charansonney

Cardiologue

Centre Hospitalier Sud-Francilien, Corbeil-Essonnes, France




    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with IF command strings

2013-07-12 Thread arun
Hi,
Regarding the 2nd issue of mean=3.8 being "too high", could you explain it.
#Using the same example:
 dat1$V21[dat1$V2==1|dat1$V2==0]
#[1]  6  2  1 10  0
 (6+2+1+10+0)/5
#[1] 3.8
 mean(dat1$V21[dat1$V2==1|dat1$V2==0])
#[1] 3.8

About missing data:
set.seed(55)
dat2<- as.data.frame(matrix(sample(c(NA,0:4),26*10,replace=TRUE),ncol=26))  
new example dataset
 dat2$V2
 #[1]  4 NA  0  0  1  3  2  4  2  1
dat2$V21
 #[1] NA  3  0  0  2  0  4  0  3 NA
(dat2$V2==1|dat2$V2==0) &!is.na(dat2$V2)
# [1] FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE
 dat2$V21[(dat2$V2==1|dat2$V2==0) &!is.na(dat2$V2)]
#[1]  0  0  2 NA
mean(dat2$V21[(dat2$V2==1|dat2$V2==0) &!is.na(dat2$V2)],na.rm=TRUE)
#[1] 0.667
 (0+0+2)/3
#[1] 0.667


If this doesn't solve the problem, please provide a reproducible example using 
?dput() 
ex:
dput(head(dataset,20))

A.K.



When I enter that formula I get "NA" or NaN" as an answer.  I have some 
missing data, which was entered in as NA, so I'm not sure if that is the
 problem.  Originally I thought I would need to do the entire set of 
equations you posted, but that gave me 3.8 as a mean, which I know is 
too high to be the mean for this data set. 

Thanks 



- Original Message -
From: arun 
To: R help 
Cc: 
Sent: Friday, July 12, 2013 8:21 AM
Subject: Re: Help with IF command strings

Hi,

Not sure I understand your question.
Suppose `data1` is your real data, but if the column names are different, 
change "V21", "V2" by those in the real data. Based on your initial post, the 
column names seemed to be the same.
mean(data1$V21[data1$V2==1|data1$V2==0])

A.K.  


What values would I substitute by real data.  I did everything the way 
you posted, and I got 3.8 as well.  So I'm curious what values I would 
change to get the mean for the actual data? 


- Original Message -
From: arun 
To: R help 
Cc: 
Sent: Thursday, July 11, 2013 9:21 PM
Subject: Re: Help with IF command strings

HI,
Try this:
set.seed(485)
dat1<- as.data.frame(matrix(sample(0:10,26*10,replace=TRUE),ncol=26))
mean(dat1$V21[dat1$V2==1|dat1$V2==0])
#[1] 3.8
#or
with(dat1,mean(V21[V2==1|V2==0]))
#[1] 3.8


A.K.


I have data in 26 columns, I'm trying to get a mean for column 21 only for the 
participants that are either 0 or 1 in column 2. 

One of the commands I tried looked something like this 

mean(data1$V21, if(V2 = 1))   

So basically I need to have the program run a mean (and later 
other forms of analysis) on participants based on their condition. 
either 0 or 1. 

Help is greatly appreciated. 

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replace multiple values in vector at once

2013-07-12 Thread arun
Hi,
library(car)
 recode(x,"'x'=1;'y'=2;'z'=3")
#[1] 1 1 1 2 2 2 3 3 3
#or
as.numeric(factor(x))
#[1] 1 1 1 2 2 2 3 3 3
A.K.




- Original Message -
From: Trevor Davies 
To: "r-help@r-project.org" 
Cc: 
Sent: Friday, July 12, 2013 5:56 PM
Subject: Re: [R] replace multiple values in vector at once

I always think that replying to your own r-help feels silly but it's good
to close these things out.

here's my hack solution:

x1<-merge(data.frame(A=x),data.frame(A=c('x','y','z'),B=c(1,2,2)),by='A')[,2]

Well that works and should for my more complex situation.  If anyone has
something a little less heavy handed I'd live to hear it.

Have a great weekend.


On Fri, Jul 12, 2013 at 2:18 PM, Trevor Davies wrote:

>
> I'm trying to find a function that can replace multiple instances of
> values or characters in a vector in a one step operation.  As an example,
> the vector:
>
> x <- c(rep('x',3),rep('y',3),rep('z',3))
>
> > x
> [1] "x" "x" "x" "y" "y" "y" "z" "z" "z"
>
> I would simply like to replace all of the x's with 1's, y:2 & z:3 (or
> other characters).
> i.e:
> > x
> [1] "1" "1" "1" "2" "2" "2" "3" "3" "3"
>
> Of course, I'm aware of the replace function but this obviously gets a
> little unwieldy when there are :
> x<-replace(x,x=='x',1)
> x<-replace(x,y=='x',2)
> x<-replace(x,z=='x',3)
>
> but I can't figure out how to do it in a one stop operation.  My real
> needs is more complex obviously.  This is one of those seemingly simple
> r-operations that should be obvious but I'm coming up empty on this one.
>
> Thanks for the help.
> Trevor
>

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create new matrix from user-defined function

2013-07-12 Thread arun
Hi,
One alternative would be to change colnames:

colnames(dat3)<-1:4

 data.frame(MW_EEsDue_ERRORS=with(dat3,`1`[`4`!=rowSums(cbind(`2`,`3`))]))
  #MW_EEsDue_ERRORS
#1 1882
#2 1884
#3 1885


Also, check these:
with(dat3,4)
#[1] 4
 with(dat3,`4`)
#[1]   7   9   5   6 112
with(dat3,7)
#[1] 7
 with(dat3,`7`)
#Error in eval(expr, envir, enclos) : object '7' not found


A.K.

- Original Message -
From: bcrombie 
To: r-help@r-project.org
Cc: 
Sent: Friday, July 12, 2013 4:45 PM
Subject: Re: [R] create new matrix from user-defined function

AK, I decided to convert your “with” statement back to index-by-number, and I 
did look up the ?with help info, but I’m confused about my replacement code 
below.  I got the wrong answer (R didn’t apply the function to my column 1 
variable “A_CaseID”).  What am I doing wrong?  Do I need to change my function 
code re: index “4” (otherwise known as “D_MW_EEsDueTotal” --- my attempts at 
that have failed also)?  thanks.

#this is your correct code

> data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[D_MW_EEsDueTotal!=rowSums(cbind(B_MW_EEsDue1,C_MW_EEsDue2))]))

#  MW_EEsDue_ERRORS

#1             1882

#2             1884

#3             1885

#these are my incorrect scripts
> data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[4!=rowSums(cbind(2,3))]))
#  MW_EEsDue_ERRORS
#1             1881
#2             1882
#3             1883
#4             1884
#5             1885


> data.frame(MW_EEsDue_ERRORS=with(dat3,dat3[[1]][4!=rowSums(cbind(2,3))]))

#  MW_EEsDue_ERRORS

#1             1881

#2             1882

#3             1883

#4             1884

#5             1885


> data.frame(MW_EEsDue_ERRORS=with(dat3,1[4!=rowSums(cbind(2,3))]))

#  MW_EEsDue_ERRORS

#1                1

Original database:
dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885),
                  B_MW_EEsDue1 = c(2, 2, 1, 4, 6),
                  C_MW_EEsDue2 = c(5, 5, 4, 1, 6),
                  D_MW_EEsDueTotal = c(7, 9, 5, 6, 112))
dat3
# A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal
# 1     1881            2            5                7
# 2     1882            2            5                9
# 3     1883            1            4                5
# 4     1884            4            1                6
# 5     1885            6            6              112


From: arun kirshna [via R] [mailto:ml-node+s789695n4671365...@n4.nabble.com]
Sent: Thursday, July 11, 2013 4:55 PM
To: Crombie, Burnette N
Subject: Re: create new matrix from user-defined function

Hi BNC,
No problem.
You could also use ?with()

data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[D_MW_EEsDueTotal!=rowSums(cbind(B_MW_EEsDue1,C_MW_EEsDue2))]))
#  MW_EEsDue_ERRORS
#1             1882
#2             1884
#3             1885
A.K.






--
View this message in context: 
http://r.789695.n4.nabble.com/create-new-matrix-from-user-defined-function-tp4671250p4671445.html
Sent from the R help mailing list archive at Nabble.com.
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multi-condition summing puzzle

2013-07-12 Thread arun
Hi,
May be this helps:

dat1<- read.table(text="
ID county date company 
1   x  1   comp1
2   y  1   comp3
3   y  2   comp1
4   y  3   comp1
5    x  2  comp2
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<- dat1
dat1$answer<-unsplit(lapply(split(dat1,dat1$county),function(x) 
do.call(rbind,lapply(seq_len(nrow(x)),function(i) {x1<-x[1:i,]; 
x2<-table(x1$company)/sum(table(x1$company));sum(x2^2)}))),dat1$county)
 dat1
#  ID county date company    answer
#1  1  x    1   comp1 1.000
#2  2  y    1   comp3 1.000
#3  3  y    2   comp1 0.500
#4  4  y    3   comp1 0.556
#5  5  x    2   comp2 0.500

#or
dat2$answer<-with(dat2,unlist(ave(company,county,FUN=function(x) 
lapply(seq_along(x),function(i) {x1<-table(x[1:i]);sum((x1/sum(x1))^2)}
 dat2
#  ID county date company    answer
#1  1  x    1   comp1 1.000
#2  2  y    1   comp3 1.000
#3  3  y    2   comp1 0.500
#4  4  y    3   comp1 0.556
#5  5  x    2   comp2 0.500

A.K.

Hi - 

I have a seemingly complex data summarizing problem that I am having a hard 
time wrapping my mind around. 

What I'm trying to do is sum the square of all company market 
shares  in a given county, UP TO that corresponding time. Sum of market 
share is defined as: Number of company observations/ Total observations. 

Here is example data and desired answer: 

ID  county  datecompany answer
1      x      1        comp1           1
2      y      1        comp3           1
3      y      2        comp1           0.5
4      y      3        comp1           0.6
5       x     2       comp2           0.5

For example, to get the answer for ID 4, we look at county y, dates 1, 2, 3 and 
sum:  [(2/3)comp1]^2 +[(1/3)comp3]^2 = 0.6 

I've tried cumsum, but am simply stuck given all of the 
different conditions.  I have a large matrix of data for this with 
several hundred companies, tens of counties and unique dates. 

Any help would be extremely appreciated. 

Thank you,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to set panel data format

2013-07-13 Thread arun


Hi,

as.integer(dat$COUNTRY) # would be the easiest (Rui's solution).

Other options could be also used:
library(plyr)
 
as.integer(mapvalues(dat$COUNTRY,levels(dat$COUNTRY),seq(length(levels(dat$COUNTRY)
# [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
#or
match(dat$COUNTRY,levels(dat$COUNTRY))
# [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4


#if `COUNTRY` is not factor

dat$COUNTRY<- as.character(dat$COUNTRY)
 
as.integer(mapvalues(dat$COUNTRY,unique(dat$COUNTRY),seq(length(unique(dat$COUNTRY)
# [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

#or (if it is sorted already)
 (seq_along(dat$COUNTRY)-1)%/%as.vector(table(dat$COUNTRY))+1
# [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
A.K.


- Original Message -
From: Rui Barradas 
To: serenamas...@gmail.com
Cc: 'r-help' 
Sent: Saturday, July 13, 2013 12:04 PM
Subject: Re: [R] How to set panel data format

Hello,

It's better if you keep this on the list, the odds of getting more and 
better answers is greater.

Inline.

Em 13-07-2013 15:38, serenamas...@gmail.com escreveu:
> Hi Rui,
> thanks for your reply.
>
> No, my problem isn't one of reshaping. It is just that I want R to know I 
> have a panel and not just cross sections or time series.
>
> In other words If I had cross section data:
>
> COUNTRY   YEAR   GDP
> Albania        1999     3
> Barbados    1999     5
> Congo          1999     1
> Denmark    1999     11
> etc.                ..             ..
>
> My ID here is country, but every observation is a new cluster independent of 
> each other, so I don't care to let R know because the ID is a unique 
> identifier.
>
> Whereas if I have a panel
>
> COUNTRY   YEAR   GDP
> Albania        1999      3
> Albania        2000      3.5
> Albania        2001      3.7
> Albania        2002      4
> Albania        2003      4.5
> Barbados   1999       5
> Barbados   2000       5
> Barbados   2001       5.1
> Barbados   2002       4
> Barbados   2003       3
> Congo         1999      1
> Congo         2000      2
> Congo         2001      2
> Congo         2002      3
> Congo         2003      4
> Denmark    1999     11
> Denmark    2000     12
> Denmark    2001     13
> Denmark    2002     10
> Denmark    2003     10
> etc.                ..             ..
>
> How am I going to tell R that Albania is one same ID for all the 5 years I 
> have in the panel, in other words, Albania has to be identified by the same 
> number in the "factor" vector which R codes it with. Then Barbados is ID 2 in 
> all its years, Congo has ID 3 and so on.

R already does that, factors are coded as integers:

as.integer(dat$COUNTRY) # Albania is 1, etc


> In STATA, you sort 'by country year' and the program knows it is a panel of 
> entities observed more than once over time.  But I am not sure how to let R 
> know the same.
>
> In practice the reason why it is important to define where a country ends and 
> where a new begins is because
>
> 1) if one creates lags of variables and the program doesn't know where the 
> boundaries between countries are, the lag for the first year of Barbados in 
> my previous example will be calculated using the last year of Albania, that 
> is, the preceding country.

A way of doing this, equivalent to the previous line of code if the 
countries are grouped consecutively, is

cumsum(c(TRUE, dat$COUNTRY[-nrow(dat)] != dat$COUNTRY[-1L]))
>
> 2) I need to create countrydummies that take the value of 1 whenever a 
> country ID is equal to 1, so if Albania has 5 years of observations and each 
> of the year observations appears with a different ID, the country dummies 
> will not be created. Instead if Albania has the same country identifier (1) 
> for all the years in which it is observed, the country dummy will be the same 
> and ==1 whenever Albania is the country observed

I doubt you need to create dummuies, R does it for you when you create a 
factor. internally, factors are coded as integers, so all you need is to 
coerce them to integer like I've said earlier.

Rui Barradas

>
> Hope this makes it clearer,
> Thanks,
> Serena
>
> _
> Sent from http://r.789695.n4.nabble.com
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simplify a dataframe

2013-07-13 Thread arun
Hi,
"when the value of Debut of lines i = value Fin of lines i-1"
That part is not clear esp. when it is looked upon with the expected output 
(df2).  Also, in your example dataset:

df1$contrat[grep("^CDD",df1$contrat)]
#[1] "CDD détaché ext. Cirad" "CDD détaché ext. Cirad" "CDD détaché ext. Cirad"
#[4] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad"  "CDD détaché ext. Cirad"
#[7] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad"  "CDD détaché ext. Cirad"
##Looks like there are extra spaces in some of them.  I guess these are the same
df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad"


I tried this:
indx<-as.numeric(interaction(df1[,1:6],drop=FALSE))

 df1New<- df1
res2<-unique(within(df1New,{Debut<-ave(seq_along(indx),indx,FUN=function(x) 
Debut[head(x,1)]);Fin<- ave(seq_along(indx),indx,FUN=function(x) 
Fin[tail(x,1)])}))
 row.names(res2)<- 1:nrow(res2)

res2[,c(1,2,7:8)]
   Matricule    Nom  Debut    Fin
1  1  VERON 24/01/1995 31/12/1997
2  6 BENARD 02/02/1995 12/03/1995
3  6 BENARD 13/03/1995 31/01/1996 ###here not correct
4  8 DALNIC 24/01/1995 31/08/1995
5  8 DALNIC 01/09/1995 29/02/2000
6    934  FORNI 26/01/1995 31/08/2001
7    934  FORNI 01/09/2001 31/08/2004
8    934  FORNI 01/09/2004 31/08/2007
9    934  FORNI 01/09/2007 04/09/2012
10   934  FORNI 05/09/2012 31/12/4712


df2[,c(1,2,7:8)]
   Mat    Nom  Debut    Fin
1    1  VERON 24/01/1995 31/12/1997
2    6 BENARD 02/02/1995 12/03/1995
3    6 BENARD 13/03/1995 30/06/1995
4    6 BENARD 01/01/1996 31/01/1996 #missing this row 
5    8 DALNIC 24/01/1995 31/08/1995
6    8 DALNIC 01/09/1995 29/02/2000
7  934  FORNI 26/01/1995 31/08/2001
8  934  FORNI 01/09/2001 31/08/2004
9  934  FORNI 01/09/2004 31/08/2007
10 934  FORNI 01/09/2007 04/09/2012
11 934  FORNI 05/09/2012 31/12/4712


Here, the dates look similar to the ones on df2 except for one row in df2.

A.K.




- Original Message -
From: Arnaud Michel 
To: R help 
Cc: 
Sent: Friday, July 12, 2013 3:45 PM
Subject: [R] simplify a dataframe

Hello

I have the following problem : group the lines of a dataframe when no 
information change (Matricule, Nom, Sexe, DateNaissance, Contrat, Pays) 
and when the value of Debut of lines i = value Fin of lines i-1
I can obtain it with a do loop. Is it possible to avoid the loop ?

The dataframe initial is df1
dput(df1)
structure(list(Matricule = c(1L, 1L, 1L, 6L, 6L, 6L, 6L, 6L,
6L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 934L, 934L, 934L, 934L,
934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L,
934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L,
934L, 934L, 934L, 934L), Nom = c("VERON", "VERON", "VERON", "BENARD",
"BENARD", "BENARD", "BENARD", "BENARD", "BENARD", "DALNIC", "DALNIC",
"DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC",
"FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI",
"FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI",
"FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI",
"FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI",
"FORNI", "FORNI"), Sexe = c("Féminin", "Féminin", "Féminin",
"Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin",
"Féminin", "Féminin", "Féminin", "Féminin", "Féminin", "Féminin",
"Féminin", "Féminin", "Féminin", "Masculin", "Masculin", "Masculin",
"Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin",
"Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin",
"Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin",
"Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin",
"Masculin", "Masculin", "Masculin"), DateNaissance = c("02/09/1935",
"02/09/1935", "02/09/1935", "01/04/1935", "01/04/1935", "01/04/1935",
"01/04/1935", "01/04/1935", "01/04/1935", "19/02/1940", "19/02/1940",
"19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940",
"19/02/1940", "19/02/1940", "10/07/1961", "10/07/1961", "10/07/1961",
"10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961",
"10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961",
"10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961",
"10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961",
"10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961",
"10/07/1961", "10/07/1961"), contrat = c("CDI commun", "CDI commun",
"CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun",
"CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun",
"CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun",
"CDI commun", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad",
"CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. 
Cirad",
"CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. 
Cirad",
"CDD détaché ext. Cirad", "CDI Détachés Autres", "CDI Détachés Autres",
"CDI Détachés Autres", "CDI Détachés Autres", "C

Re: [R] Test for column equality across matrices

2013-07-13 Thread arun
I tried it on a slightly bigger dataset:
A1 <- matrix(t(expand.grid(1:90, 15, 16)), nrow = 3)
B1 <- combn(90, 3)
which(is.element(columnsOf(B1), columnsOf(A1)))
# [1]  1067  4895  8636 12291 15861 19347 22750 26071 29311 32471 35552 38555
#[13] 41481


which(apply(t(B1),1,paste,collapse="")%in%apply(t(A1),1,paste,collapse=""))
# [1]  1067  4895  8636 12291 15861 19347 22750 26071 29311 32471 35552 38555
#[13] 41481 44331


B1[,44331]
#[1] 14 15 16


which(apply(t(A1),1,paste,collapse="")=="141516")
#[1] 14

B1New<-B1[,!apply(t(B1),1,paste,collapse="")%in%apply(t(A1),1,paste,collapse="")]
newB <- B1[ , !is.element(columnsOf(B1), columnsOf(A1))]
 identical(B1New,newB)
#[1] FALSE

 is.element(B1[,44331],A1[,14])
#[1] TRUE TRUE TRUE


 B1Sp<-columnsOf(B1)
B1Sp[[44331]]
#[1] 14 15 16
 A1Sp<- columnsOf(A1)
 A1Sp[[14]]
#[1] 14 15 16
 is.element(B1Sp[[44331]],A1Sp[[14]])
#[1] TRUE TRUE TRUE


A.K.



- Original Message -
From: William Dunlap 
To: Thiem Alrik ; "mailman, r-help" 

Cc: 
Sent: Saturday, July 13, 2013 1:30 PM
Subject: Re: [R] Test for column equality across matrices

Try
   columnsOf <- function(mat) split(mat, col(mat))
   newB <- B[ , !is.element(columnsOf(B), columnsOf(A))]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Thiem Alrik
> Sent: Saturday, July 13, 2013 6:45 AM
> To: mailman, r-help
> Subject: [R] Test for column equality across matrices
> 
> Dear list,
> 
> I have two matrices
> 
> A <- matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3)
> B <- combn(16, 3)
> 
> Now I would like to exclude all columns from the 560 columns in B which are 
> identical to
> any 1 of the 6 columns in A. How could I do this?
> 
> Many thanks and best wishes,
> 
> Alrik
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test for column equality across matrices

2013-07-13 Thread arun
Hi,
One way would be:
 which(apply(t(B),1,paste,collapse="")%in%apply(t(A),1,paste,collapse=""))
#[1] 105 196 274 340 395
B[,105]
#[1]  1 15 16
 B[,196]
#[1]  2 15 16
 B1<-B[,!apply(t(B),1,paste,collapse="")%in%apply(t(A),1,paste,collapse="")]
 dim(B1)
#[1]   3 555
 dim(B)
#[1]   3 560

#or
B2<-B[,is.na(match(interaction(as.data.frame(t(B))),interaction(as.data.frame(t(A)]
 identical(B1,B2)
#[1] TRUE


A.K.





- Original Message -
From: Thiem Alrik 
To: "mailman, r-help" 
Cc: 
Sent: Saturday, July 13, 2013 9:45 AM
Subject: [R] Test for column equality across matrices

Dear list,

I have two matrices

A <- matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3)
B <- combn(16, 3)

Now I would like to exclude all columns from the 560 columns in B which are 
identical to any 1 of the 6 columns in A. How could I do this?

Many thanks and best wishes,

Alrik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "not all duplicated" question

2013-07-13 Thread arun
Hi,
May be this helps:
dat1<- read.table(text="
Country, Pet
France, Dog
France, Cat
France, Dog
Canada, Cat
Canada, Cat
Japan, Dog
Japan, Cat
Italy, Cat
",sep=",",header=TRUE,stringsAsFactors=FALSE)


 dat1[with(dat1,as.numeric(ave(Pet,Country,FUN=function(x) 
length(unique(x)>1,]
#  Country  Pet
#1  France  Dog
#2  France  Cat
#3  France  Dog
#6   Japan  Dog
#7   Japan  Cat
A.K.



- Original Message -
From: Vesco Miloushev 
To: r-help@r-project.org
Cc: 
Sent: Saturday, July 13, 2013 4:12 PM
Subject: [R] "not all duplicated" question

Hi,

I want to select elements which have duplicates by are not all duplicated.

Here is what I mean. Suppose I have a two column matrix with columns
"Country" and "Pet"


Country, Pet
--
France, Dog
France, Cat
France, Dog
Canada, Cat
Canada, Cat
Japan, Dog
Japan, Cat
Italy, Cat

I want to extract all the entries that are duplicated in column
"Country" but not ALL duplicated in column "Pet".

In this case I want

Country, Pet
--
France, Dog
France, Cat
France, Dog
Japan, Dog
Japan, Cat

Notice that I keep France, because not all are duplicated. If there
was no entry "France, Cat" then it all of the entries with "France"
would be eliminated.

Thanks for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix column flip when recycled

2013-07-14 Thread arun

library(plyr)
M.1[,1:2]<-do.call(rbind,alply(replicate(3,M.2),3,function(x) x))
#or

M.1[,1:2]<-matrix(aperm(replicate(3,M.2),c(1,3,2)),ncol=2)

A.K.




- Original Message -
From: Thiem Alrik 
To: "mailman, r-help" 
Cc: 
Sent: Sunday, July 14, 2013 9:48 AM
Subject: [R] Matrix column flip when recycled

Dear list,

I have a matrix M.1 (30x2) into which I would like to paste another matrix M.2 
(10x2) three times. However, the columns get flipped in every odd-numbered 
recycle run. How can I avoid this behavior?

M.1 <- matrix(numeric(30*2), ncol = 2)
M.2 <- t(combn(1:5, 2))
M.1[, 1:2] <- M.2

Many thanks for help,

Alrik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating dummy variables based on conditions

2013-07-14 Thread arun
Hi,
You could try this: (if I understand it correctly)
dat1<- read.table(text="
year    id var ans
 2010  1  1  1
 2010  2  0  0
 2010  1  0  1
2010  1  0  1
 2011  2  1  1
 2011  2  0  1
 2011  1  0  0
2011  1  0  0
",sep="",header=TRUE,stringsAsFactors=FALSE)

dat1$newres<-with(dat1,ave(var,id,year,FUN=function(x) any(x==1)*1))
 dat1
#  year id var ans newres
#1 2010  1   1   1  1
#2 2010  2   0   0  0
#3 2010  1   0   1  1
#4 2010  1   0   1  1
#5 2011  2   1   1  1
#6 2011  2   0   1  1
#7 2011  1   0   0  0
#8 2011  1   0   0  0

A.K.

- Original Message -
From: Anup Nandialath 
To: r-help@r-project.org
Cc: 
Sent: Sunday, July 14, 2013 7:30 AM
Subject: [R] creating dummy variables based on conditions

Hello everyone,

I have a dataset which includes the first three variables from the demo
data below (year, id and var). I need to create the new variable ans as
follows

If var=1, then for each year (where var=1), i need to create a new dummy
ans which takes the value of 1 for all corresponding id's where an instance
of one was recorded. Sample data with the output is shown below.

    year    id var ans
[1,] 2010  1   1   1
[2,] 2010  2   0   0
[3,] 2010  1   0   1
[4,] 2010  1   0   1
[5,] 2011  2   1   1
[6,] 2011  2   0   1
[7,] 2011  1   0   0
[8,] 2011  1   0   0

Any help on how to achieve this is much appreciated.

Thanks
Anup

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simplify a dataframe

2013-07-14 Thread arun
Hi,
May be this helps you.
df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" 
df1[48,8]
[1] "31/12/4712" #strange value

df1[48,8]<- "31/12/2013"  #changed

indx<-as.numeric(interaction(df1[,1:6],drop=TRUE))
res<-do.call(rbind,lapply(split(df1,indx),function(x) {x1<- 
as.Date(x$Debut,format="%d/%m/%Y");x2<- 
as.Date(x$Fin,format="%d/%m/%Y");do.call(rbind,lapply(split(x,cumsum(c(FALSE,(x1[-1]-x2[-nrow(x)])!=1))),function(x)
 
data.frame(x[1,1:6],Debut=head(x$Debut,1),Fin=tail(x$Fin,1),stringsAsFactors=FALSE)))}))

 res[order(res$Matricule),]  #the order of rows is a bit different than df2.
    Matricule    Nom Sexe DateNaissance    contrat    Pays
5   1  VERON  Féminin    02/09/1935 CDI commun  France
4.0 6 BENARD Masculin    01/04/1935 CDI commun  France
4.1 6 BENARD Masculin    01/04/1935 CDI commun  France
10  6 BENARD Masculin    01/04/1935 CDI commun Philippines
6   8 DALNIC  Féminin    19/02/1940 CDI commun  France
9   8 DALNIC  Féminin    19/02/1940 CDI commun  Martinique
1 934  FORNI Masculin    10/07/1961 CDD détaché ext. Cirad    Cameroun
2 934  FORNI Masculin    10/07/1961 CDI commun   Congo
3 934  FORNI Masculin    10/07/1961    CDI Détachés Autres   Congo
7 934  FORNI Masculin    10/07/1961    CDI Détachés Autres  France
8 934  FORNI Masculin    10/07/1961 CDI commun   Gabon
 Debut    Fin
5   24/01/1995 31/12/1997
4.0 13/03/1995 30/06/1995
4.1 01/01/1996 31/01/1996
10  02/02/1995 12/03/1995
6   24/01/1995 31/08/1995
9   01/09/1995 29/02/2000
1   26/01/1995 31/08/2001
2   05/09/2012 31/12/2013
3   01/09/2004 31/08/2007
7   01/09/2001 31/08/2004
8   01/09/2007 04/09/2012


A.K.




From: Arnaud Michel 
To: arun  
Cc: R help ; jholt...@gmail.com; Rui Barradas 
 
Sent: Sunday, July 14, 2013 12:17 PM
Subject: Re: [R] simplify a dataframe



Hi,
Excuse me for the indistinctness

Le 13/07/2013 17:18, arun a écrit :

Hi,
"when the value of Debut of lines i = value Fin of lines i-1"
That part is not clear esp. when it is looked upon with the expected output 
(df2).
I want to group the lines which have the same caracteristics (Matricule, Nom, 
Sexe, DateNaissance, Contrat, Pays) and with period of time (Debut/start and 
Fin/end) without interruption of time.
For exemple :
The following three lines
: 
Debut/Start  Fin/End  
1  VERON  Féminin    02/09/1935 CDI commun  France 24/01/1995 
30/04/1997
1  VERON  Féminin    02/09/1935 CDI commun  France
01/05/1997 30/12/1997
1  VERON  Féminin    02/09/1935 CDI commun  France
31/12/1997 31/12/1997
are transformed into 1 line
1  VERON  Féminin    02/09/1935 CDI commun  France 24/01/1995 
31/12/1997
because same caracteristicsand period of time without interruption
of time (from 24/01/1995 to 31/12/1997)

The following six lines :
6 BENARD Masculin    01/04/1935 CDI commun Philippines 02/02/1995 
27/02/1995  
6 BENARD Masculin    01/04/1935 CDI commun Philippines
28/02/1995 28/02/1995 
6 BENARD Masculin    01/04/1935 CDI commun Philippines
01/03/1995 12/03/1995
6 BENARD Masculin    01/04/1935 CDI commun  France 13/03/1995 
30/06/1995
6 BENARD Masculin    01/04/1935 CDI commun  France 01/01/1996 
30/01/1996
6 BENARD Masculin    01/04/1935 CDI commun  France
31/01/1996 31/01/1996
are transformed into
6 BENARD Masculin    01/04/1935 CDI commun Philippines 02/02/1995 
12/03/1995
6 BENARD Masculin    01/04/1935 CDI commun  France 13/03/1995 
30/06/1995
6 BENARD Masculin    01/04/1935 CDI commun  France 01/01/1996 
31/01/1996
because
lines 1-3 identical for caracteristics and without interruption in
time
lines 4 and lines 5-6 are not grouped because there is an
interruption in time beetween 30/06/1995 and 01/01/1996

Thank you for your help
Michel


  Also, in your example dataset: df1$contrat[grep("^CDD",df1$contrat)]
#[1] "CDD détaché ext. Cirad" "CDD détaché ext. Cirad" "CDD détaché ext. Cirad"
#[4] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad"  "CDD détaché ext. Cirad"
#[7] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad"  "CDD détaché ext. Cirad"
##Looks like there are extra spaces in some of them.  I guess these are the same
df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" I tried this:
indx<-as.numeric(interaction(df1[,1:6],drop=FALSE))  df1New&l

Re: [R] Need hep for converting date data in POSIXct

2013-07-14 Thread arun
HI,
Try this:
Geo<- read.table(text="
long    lat.comp confianza
 9.31   -42.72 3
11.66  -40.63 9
10.88  -38.60 9
10.72 -37.86 9
13.06 -39.04 9
16.02 -38.51 6
",sep="",header=TRUE) 
 col1<- as.numeric(factor(Geo$confianza))
 with(Geo, plot(long,lat.comp,col=col1))
A.K.







From: laila Aranda Romero 
To: arun  
Sent: Sunday, July 14, 2013 3:28 PM
Subject: RE: [R] Need hep for converting date data in POSIXct




Arun, 

I contact you again because I have another difficulty with R.  I posted the 
following message but it hasn't been accepted by the fórum filter. So I'm not 
sure if you can see it 

I have the following database: 

head(Geo) 
long    lat.comp confianza 
 9.31       -42.72         3 
11.66      -40.63         9 
10.88      -38.60         9 
10.72     -37.86         9 
13.06     -39.04         9 
16.02     -38.51         6 

I am trying to plot   Geo$ long versus Geo$lat.comp with diferent colours 
regarding the number of Geo$confianza. I don't know how to make the palette and 
tell R to plot the points using this palette in the same graph.
Regards,
Laila


> Date: Thu, 11 Jul 2013 03:10:40 -0700
> From: smartpink...@yahoo.com
> Subject: Re: [R] Need hep for converting date data in POSIXct
> To: laila_...@hotmail.com
> 
> Hi Laila,
> No problem.
> Regards,
> Arun
> 
> 
> 
> 
> - Original Message -
> From: laila 
> To: r-help@r-project.org
> Cc: 
> Sent: Thursday, July 11, 2013 3:38 AM
> Subject: Re: [R] Need hep for converting date data in POSIXct
> 
> Arun, the last email has been sent it by itself. I have just found the 
> problem and it works. Thank very much 
> 
> Date: Wed, 10 Jul 2013 19:36:43 -0700
> From: ml-node+s789695n4671274...@n4.nabble.com
> To: laila_...@hotmail.com
> Subject: Re: Need hep for converting date data in POSIXct
> 
> 
> 
>     
> 
> Hi,
> 
> I guess the error message:
> 
> > vmask(lat,lon,time,vmax=25)
> 
> Error en vmask(lat, lon,
> 
> time, vmax = 25) : objeto 'lat' no encontrado
> 
> 
> says that you have not defined the object 'lat'.
> 
> 
> time<-subset(Geo, select =date)
> 
> time[,1]<-  as.POSIXct(time[,1],format="%d/%m/%Y %H:%M")
> 
> location<- subset(Geo,select=c(lat.comp,long))
> 
> time1<- time[,1]
> 
> lat<- location[,1]
> 
> long<- location[,2]
> 
> library(argosfilter)
> 
> vmask(lat,long,time1,25)
> 
> #[1] "end_location" "end_location" "not"          "not"          
> "end_location"
> 
> #[6] "end_location"
> 
> 
> A.K.
> 
> 
> 
> From: laila Aranda Romero <[hidden email]>
> 
> To: arun <[hidden email]> 
> 
> Sent: Wednesday, July 10, 2013 6:21 PM
> 
> Subject: RE: [R] Need hep for converting date data in POSIXct
> 
> 
> 
> 
> 
> 
> Hi,
> 
> 
> The code: 
> 
> 
> library(argosfilter)
> 
> setwd("C:/Users/Usuario/Dropbox/Laila Aranda/PUFGRA")
> 
> Geo = 
> 
> read.table("2370001_PUFGRA_2009_Gough_000_retarded10_both.trj",header=FALSE,sep
> 
> = ",", col.names= c("type", "date",
> 
> "secs", "Trans1",  "Trans2",
> 
> "lat.sta",  "lat.comp", "long", 
> 
> "dist", "rumbo", "velocidad", 
> 
> "confianza"))
> 
> View(Geo)
> 
> location=subset(Geo, select= c(lat.comp,long))
> 
> time=subset(Geo, select =c(date))
> 
> time[,1]<-as.POSIXct(time[,1],format="%d/%m/%Y
> 
> %H:%M")  
> 
> vmask(lat,lon,time,vmax=25)
> 
> 
> 
> 
> 
> The example: library(argosfilter)
> 
> > setwd("C:/Users/Usuario/Dropbox/LailaAranda/PUFGRA")
> 
> > Geo = 
> > read.table("2370001_PUFGRA_2009_Gough_000_retarded10_both.trj",header=FALSE,sep
> 
> = ",", col.names= c("type", "date","secs", "Trans1", "Trans2", "lat.sta", 
> "lat.comp", "long", "dist", "rumbo", "velocidad",  "confianza"))
> 
> > str(Geo)
> 
> 
> 'data.frame':  582
> 
> obs. of  12 variables: $
> 
> type     : Factor w/ 2 levels
> 
> "midnight","noon": 2 1 2 1 2 1 2 1 2 1 ...
> 
> $
> 
> date     : Factor w/ 582 levels
> 
> "01/01/2009 01:58",..: 370 389 390 409 410 429 430 450 451 471 ...
> 
> 
>

Re: [R] simplify a dataframe

2013-07-14 Thread arun
HI Michel,
This gives the same order as that of df2.
df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad"
df1[48,8]<- "31/12/2013"
indx<-as.numeric(interaction(df1[,1:6],drop=TRUE))
lst1<-split(df1,indx)
 lst2<-lst1[match(unique(indx),names(lst1))]
res<-do.call(rbind,lapply(lst2,function(x){x1<- 
as.Date(x$Debut,format="%d/%m/%Y");x2<- 
as.Date(x$Fin,format="%d/%m/%Y");do.call(rbind,lapply(split(x,cumsum(c(FALSE,(x1[-1]-x2[-nrow(x)])!=1))),function(x)
 
data.frame(x[1,1:6],Debut=head(x$Debut,1),Fin=tail(x$Fin,1),stringsAsFactors=FALSE)))}))
 row.names(res)<- 1:nrow(res)
 df2[11,8]<- "31/12/2013"
 names(res)[1]<- "Mat"
 identical(res,df2)
#[1] TRUE


A.K.



- Original Message -
From: arun 
To: Arnaud Michel 
Cc: R help 
Sent: Sunday, July 14, 2013 2:39 PM
Subject: Re: [R] simplify a dataframe

Hi,
May be this helps you.
df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" 
df1[48,8]
[1] "31/12/4712" #strange value

df1[48,8]<- "31/12/2013"  #changed

indx<-as.numeric(interaction(df1[,1:6],drop=TRUE))
res<-do.call(rbind,lapply(split(df1,indx),function(x) {x1<- 
as.Date(x$Debut,format="%d/%m/%Y");x2<- 
as.Date(x$Fin,format="%d/%m/%Y");do.call(rbind,lapply(split(x,cumsum(c(FALSE,(x1[-1]-x2[-nrow(x)])!=1))),function(x)
 
data.frame(x[1,1:6],Debut=head(x$Debut,1),Fin=tail(x$Fin,1),stringsAsFactors=FALSE)))}))

 res[order(res$Matricule),]  #the order of rows is a bit different than df2.
    Matricule    Nom Sexe DateNaissance    contrat    Pays
5   1  VERON  Féminin    02/09/1935 CDI commun  France
4.0 6 BENARD Masculin    01/04/1935 CDI commun  France
4.1 6 BENARD Masculin    01/04/1935 CDI commun  France
10  6 BENARD Masculin    01/04/1935 CDI commun Philippines
6   8 DALNIC  Féminin    19/02/1940 CDI commun  France
9   8 DALNIC  Féminin    19/02/1940 CDI commun  Martinique
1 934  FORNI Masculin    10/07/1961 CDD détaché ext. Cirad    Cameroun
2 934  FORNI Masculin    10/07/1961 CDI commun   Congo
3 934  FORNI Masculin    10/07/1961    CDI Détachés Autres   Congo
7 934  FORNI Masculin    10/07/1961    CDI Détachés Autres  France
8 934  FORNI Masculin    10/07/1961 CDI commun   Gabon
 Debut    Fin
5   24/01/1995 31/12/1997
4.0 13/03/1995 30/06/1995
4.1 01/01/1996 31/01/1996
10  02/02/1995 12/03/1995
6   24/01/1995 31/08/1995
9   01/09/1995 29/02/2000
1   26/01/1995 31/08/2001
2   05/09/2012 31/12/2013
3   01/09/2004 31/08/2007
7   01/09/2001 31/08/2004
8   01/09/2007 04/09/2012


A.K.




From: Arnaud Michel 
To: arun  
Cc: R help ; jholt...@gmail.com; Rui Barradas 
 
Sent: Sunday, July 14, 2013 12:17 PM
Subject: Re: [R] simplify a dataframe



Hi,
Excuse me for the indistinctness

Le 13/07/2013 17:18, arun a écrit :

Hi,
"when the value of Debut of lines i = value Fin of lines i-1"
That part is not clear esp. when it is looked upon with the expected output 
(df2).
I want to group the lines which have the same caracteristics (Matricule, Nom, 
Sexe, DateNaissance, Contrat, Pays) and with period of time (Debut/start and 
Fin/end) without interruption of time.
For exemple :
The following three lines
: 
Debut/Start  Fin/End  
1  VERON  Féminin    02/09/1935 CDI commun  France 24/01/1995 
30/04/1997
1  VERON  Féminin    02/09/1935 CDI commun  France
01/05/1997 30/12/1997
1  VERON  Féminin    02/09/1935 CDI commun  France
31/12/1997 31/12/1997
are transformed into 1 line
1  VERON  Féminin    02/09/1935 CDI commun  France 24/01/1995 
31/12/1997
because same caracteristicsand period of time without interruption
of time (from 24/01/1995 to 31/12/1997)

The following six lines :
6 BENARD Masculin    01/04/1935 CDI commun Philippines 02/02/1995 
27/02/1995  
6 BENARD Masculin    01/04/1935 CDI commun Philippines
28/02/1995 28/02/1995 
6 BENARD Masculin    01/04/1935 CDI commun Philippines
01/03/1995 12/03/1995
6 BENARD Masculin    01/04/1935 CDI commun  France 13/03/1995 
30/06/1995
6 BENARD Masculin    01/04/1935 CDI commun  France 01/01/1996 
30/01/1996
6 BENARD Masculin    01/04/1935 CDI commun  France
31/01/1996 31/01/1996
are transformed into
6 BENARD Masculin    01/04/1935 CDI commun Philippines 02/02/1995 
12/03/1995
6 BENARD Masculin    01/04/1935 CDI commun  France 13/03/1995 
30/06/1995
6 BENARD Masculin    01/04/1935 CDI commun  Franc

Re: [R] t-test across columns

2013-07-15 Thread arun
Hi,
Not sure about the format for the 2nd part.
df1<- ##data

library(plyr)
df2<-ddply(df1,.(name,cat),summarize, 
cbind(t.test(val,df1$val)$statistic,t.test(val,df1$val)$p.value))
 df3<-cbind(df2[,1:2],data.frame(df2[,3]))
 colnames(df3)[3:4]<- c("t-val","p.val")
library(reshape2)
df3m<-  melt(df3,id.var=c("name","cat"))
xtabs(value~name+cat+variable,data=df3m)
, , variable = t-val

  cat
name  p178266580    p178269196    p178316310    p191287337    p195158904
  12.2 -1.1697701975 -5.2812696387 -1.2740973341  2.1926665883  0.1529759080
  15.9 -2.5063901671  0.00 -0.2169806106  1.5455008954 -1.6574358795
  cat
name  p196921846    p197427158    p238921966
  12.2  0.2260409495 -0.3320635130  3.3659689025
  15.9  6.6278680348  0.00  0.00

, , variable = p.val

  cat
name  p178266580    p178269196    p178316310    p191287337    p195158904
  12.2  0.3092408498  0.0003382099  0.3762474897  0.0419925673  0.8812900356
  15.9  0.0147796276  0.00  0.8365830321  0.1822041450  0.1096087365
  cat
name  p196921846    p197427158    p238921966
  12.2  0.8226135494  0.7435688987  0.0071990164
  15.9  0.0005489640  0.00  0.00

#or
res<-dcast(df3m,name~cat+variable,value.var="value")
row.names(res)<- res[,1]
 res1<- res[,-1]
res1
 p178266580_t-val p178266580_p.val p178269196_t-val p178269196_p.val
12.2 -1.16977   0.30924085 -5.28127 0.0003382099
15.9 -2.50639   0.01477963   NA   NA
 p178316310_t-val p178316310_p.val p191287337_t-val p191287337_p.val
12.2   -1.2740973    0.3762475 2.192667   0.04199257
15.9   -0.2169806    0.8365830 1.545501   0.18220414
 p195158904_t-val p195158904_p.val p196921846_t-val p196921846_p.val
12.2    0.1529759    0.8812900    0.2260409  0.822613549
15.9   -1.6574359    0.1096087    6.6278680  0.000548964
 p197427158_t-val p197427158_p.val p238921966_t-val p238921966_p.val
12.2   -0.3320635    0.7435689 3.365969  0.007199016
15.9   NA   NA   NA   NA

A.K.


- Original Message -
From: Nico Met 
To: R help 
Cc: 
Sent: Monday, July 15, 2013 11:50 AM
Subject: [R] t-test across columns

Dear all,

I would like to do t-test across two columns "name" with different "cat"
with overall mean ("val").

(Removing if there is a single observation)

And finally, make a matrix with t-value and p-value associated with a name
(in rows) and cat (in columns)

dput(x)
structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("12.2", "15.9"
), class = "factor"), cat = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("p178266580",
"p178269196", "p178316310", "p191287337", "p195158904", "p196921846",
"p197427158", "p238921966"), class = "factor"), val = c(148.90772,
184.253375, 183.97486667, 191.868125, 173.30515, 187.876975,
177.453775, 184.799525, 212.39065, 205.504525, 186.152025, 194.337075,
193.2703, 204.71665, 211.4452, 202.609175, 203.72918, 193.7261,
196.1186, 202.79556, 203.48818, 191.13744, 205.23315, 198.66842,
196.81032, 200.90512, 206.13564, 205.372225, 196.22835, 211.04686,
219.9771, 224.7602, 231.6596, 211.10581667, 215.44474,
210.83514, 228.173125, 224.09034, 212.96026, 239.0085, 213.5407,
227.12115, 209.24888, 232.8964, 232.22146, 228.1643, 236.43082,
232.20792, 238.49192, 224.64014, 233.75898, 207.06138, 215.3649,
211.14802, 201.86854, 200.52278, 199.05752, 194.90904, 214.44334,
249.35726667, 239.98525, 234.50848333, 243.86508333,
233.59581667, 248.1219, 225.28941667, 248.22088333,
193.69566, 198.43578, 205.06055, 208.525975, 198.28692, 206.88496,
201.60162, 205.7943, 210.5117, 196.69886, 193.58288, 198.86094,
201.81676, 225.8266, 205.879725, 218.370475, 214.006125, 198.74038,
206.00314, 198.37446, 225.5357, 216.721025, 226.543925, 158.1011,
158.15674, 166.07518, 179.942225, 158.16046, 165.0685, 159.56146
)), .Names = c("name", "cat", "val"), class = "data.frame", row.names = c(
NA,
97L))

Thanks

Nico

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.eth

Re: [R] t-test across columns

2013-07-15 Thread arun
Hi,
May be I misunderstood ur question.
The output David got could be also obtained by:
#df1 dataset

library(plyr)

df2<-ddply(df1,.(cat),function(x) if(min(table(x$name))>1){x1<- 
t.test(val~name,x);cbind(t=x1$statistic,p.value=x1$p.value)})
 df2
# cat  t  p.value
#1 p178266580 -0.1156475 0.9144054453
#2 p178316310 -1.0874356 0.4143944591
#3 p191287337 -0.6776053 0.5315717871
#4 p195158904  1.1522850 0.2769290728
#5 p196921846 -4.2342996 0.0003925339

But, the second part is still unclear.

A.K.


- Original Message -
From: David Carlson 
To: 'Nico Met' ; 'R help' 
Cc: 
Sent: Monday, July 15, 2013 1:33 PM
Subject: Re: [R] t-test across columns

This may be close to what you want:

> t.val <- by(x, x$cat, function(y) if (min(table(y$name)>1)) {
+      t.test(val~name, y)})
> t.out <- do.call(rbind, sapply(t.val, function(y) c(y$statistic, 
+       p.value=y$p.value)))
> t.out
                    t      p.value
p178266580 -0.1156475 0.9144054453
p178316310 -1.0874356 0.4143944591
p191287337 -0.6776053 0.5315717871
p195158904  1.1522850 0.2769290728
p196921846 -4.2342996 0.0003925339

But I'm not sure what you mean about columns for each cat unless you
want the frequencies: 

> freq.out <- xtabs(~cat+name, x)
> freq.out <- freq.out[apply(freq.out, 1, function(y) min(y) > 1),]
> freq.out
            name
cat          12.2 15.9
  p178266580    4   11
  p178316310    2    3
  p191287337    3    5
  p195158904    8    7
  p196921846   26    5
> results <- cbind(freq.out, t.out)
> results
           12.2 15.9          t      p.value
p178266580    4   11 -0.1156475 0.9144054453
p178316310    2    3 -1.0874356 0.4143944591
p191287337    3    5 -0.6776053 0.5315717871
p195158904    8    7  1.1522850 0.2769290728
p196921846   26    5 -4.2342996 0.0003925339

-
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352



-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Nico Met
Sent: Monday, July 15, 2013 10:50 AM
To: R help
Subject: [R] t-test across columns

Dear all,

I would like to do t-test across two columns "name" with different
"cat"
with overall mean ("val").

(Removing if there is a single observation)

And finally, make a matrix with t-value and p-value associated with
a name
(in rows) and cat (in columns)

dput(x)
structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("12.2", "15.9"
), class = "factor"), cat = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label =
c("p178266580",
"p178269196", "p178316310", "p191287337", "p195158904",
"p196921846",
"p197427158", "p238921966"), class = "factor"), val = c(148.90772,
184.253375, 183.97486667, 191.868125, 173.30515, 187.876975,
177.453775, 184.799525, 212.39065, 205.504525, 186.152025,
194.337075,
193.2703, 204.71665, 211.4452, 202.609175, 203.72918, 193.7261,
196.1186, 202.79556, 203.48818, 191.13744, 205.23315, 198.66842,
196.81032, 200.90512, 206.13564, 205.372225, 196.22835, 211.04686,
219.9771, 224.7602, 231.6596, 211.10581667, 215.44474,
210.83514, 228.173125, 224.09034, 212.96026, 239.0085, 213.5407,
227.12115, 209.24888, 232.8964, 232.22146, 228.1643, 236.43082,
232.20792, 238.49192, 224.64014, 233.75898, 207.06138, 215.3649,
211.14802, 201.86854, 200.52278, 199.05752, 194.90904, 214.44334,
249.35726667, 239.98525, 234.50848333, 243.86508333,
233.59581667, 248.1219, 225.28941667, 248.22088333,
193.69566, 198.43578, 205.06055, 208.525975, 198.28692, 206.88496,
201.60162, 205.7943, 210.5117, 196.69886, 193.58288, 198.86094,
201.81676, 225.8266, 205.879725, 218.370475, 214.006125, 198.74038,
206.00314, 198.37446, 225.5357, 216.721025, 226.543925, 158.1011,
158.15674, 166.07518, 179.942225, 158.16046, 165.0685, 159.56146
)), .Names = c("name", "cat", "val"), class = "data.frame",
row.names = c(
NA,
97L))

Thanks

Nico

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] file.stem?

2013-07-15 Thread arun
Hi,
May be this also works.
basename(file_path_sans_ext("/the/path/to/afile.txt"))
#[1] "afile"
A.K.




- Original Message -
From: Rui Barradas 
To: Witold E Wolski 
Cc: r-help@r-project.org
Sent: Monday, July 15, 2013 10:32 AM
Subject: Re: [R] file.stem?

Hello,

You can use ?basename to write a file.stem function:


basename("/the/path/to/afile.txt")

file.stem <- function(x){
    bn <- basename(x)
    gsub("\\..*$", "", bn)
}
file.stem("/the/path/to/afile.txt")



Hope this helps,

Rui Barradas

Em 15-07-2013 15:23, Witold E Wolski escreveu:
> Looking for a function which returns the stem of the filename given a path.
> i.e.
>> file.stem("/the/path/to/afile.txt")
>> afile
>
> regards
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting numeric to character and using character pattern

2013-07-15 Thread arun
HI Irucka,
May be this is what you wanted: 
pat<-paste(paste0("http://www.",siter[,1],"..rdb";),collapse="|")
 pat
[1] 
"http://www.02437100..rdb|http://www.02439500..rdb|http://www.02441500..rdb|http://www.02446500..rdb|http://www.02467000..rdb|


A.K.



Hi, I am having a problem with my data set and conversion from numeric to 
character. 

Below is my code with comments on the specific problem below: 

hydraulicsites <- read.table("hydraulic_geometry_sites.csv", 
header = TRUE, sep = "\t", as.is = TRUE, stringsAsFactors = FALSE, 
colClasses = c("character",NA)) 
siter <- hydraulicsites[1] 

dput(siter) 
structure(list(site_no = c("02437100", "02439500", "02441500", 
"02446500", "02467000", "02470050", "03217500", "03219500", "03220510", 
"03227500", "03230700", "03231500", "03455000", "03497000", "03439000", 
"03439500", "0344", "03441000", "03454500", "03479000", "03513500", 
"04177500", "04183500", "04185000", "04185500", "04186500", "04187500", 
"04188000", "04189000", "04189500", "0419", "04191500", "04192500", 
"04193500", "06191500", "06214500", "06218500", "06222000", "06225500", 
"06228000", "06235500", "06259500", "06262000", "06264000", "06266000", 
"06269500", "06273000", "06276500", "06277500", "06279500", "06287000", 
"06288500", "06288500", "06289000", "06290500", "06293500", "06294700", 
"06329500", "0631", "06309500", "06312500", "06311000", "06311500", 
"06313000", "06313500", "06315500", "06314000", "06315000", "06316500", 
"06317000", "06317500", "06318500", "06319500", "0632", "06320500", 
"06323000", "06323500", "06324000", "06325500", "06324500", "06326500", 
"06426500", "06428000", "06428500", "06436000", "06437000", "06438000", 
"06821500", "0683", "06850500", "06856600", "0686", "06862500", 
"06864000", "06864500", "06865500", "06866000", "06877600", "06879500", 
"06887500", "06889000", "06891000", "06892500", "06342500", "0644", 
"06818000", "06893000", "06934500", "05587500", "0701", "07032000", 
"07289000")), .Names = "site_no", class = "data.frame", row.names = c(NA, 
-112L)) 

pat <- paste("http://www.";, siter, "..rdb", sep="", collapse = "|") 
str(pat) 
 chr "www.c(\"02437100\", \"02439500\", \"02441500\", \"02446500\", 
\"02467000\""| __truncated__ 


OK, the problem is with pat. I need for pat to be the same 
as patter. I have a list of sites in .csv files that I need to process 
so I would like a more efficient way of doing the process than is shown 
below. 

Is there a way to get the results in pat to resemble those in patter? 

sites3 <- c("07103990", "402114105350101", "05056215") 
patter <- paste("www.", sites3, "..rdb", sep="", collapse = "|") 
dput(patter) 
"www.07103990..rdb|www.402114105350101..rdb|www.05056215..rdb" 


Thank you. 

Irucka Embry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting numeric to character and using character pattern

2013-07-15 Thread arun
#or
pat1<-paste("http://www.";, siter[,1], "..rdb", sep="", collapse = "|") 
 identical(pat,pat1)
#[1] TRUE
A.K.



- Original Message -
From: arun 
To: Irucka Embry 
Cc: R help 
Sent: Monday, July 15, 2013 2:47 PM
Subject: Re: converting numeric to character and using character pattern

HI Irucka,
May be this is what you wanted: 
pat<-paste(paste0("http://www.",siter[,1],"..rdb";),collapse="|")
 pat
[1] 
"http://www.02437100..rdb|http://www.02439500..rdb|http://www.02441500..rdb|http://www.02446500..rdb|http://www.02467000..rdb|


A.K.



Hi, I am having a problem with my data set and conversion from numeric to 
character. 

Below is my code with comments on the specific problem below: 

hydraulicsites <- read.table("hydraulic_geometry_sites.csv", 
header = TRUE, sep = "\t", as.is = TRUE, stringsAsFactors = FALSE, 
colClasses = c("character",NA)) 
siter <- hydraulicsites[1] 

dput(siter) 
structure(list(site_no = c("02437100", "02439500", "02441500", 
"02446500", "02467000", "02470050", "03217500", "03219500", "03220510", 
"03227500", "03230700", "03231500", "03455000", "03497000", "03439000", 
"03439500", "0344", "03441000", "03454500", "03479000", "03513500", 
"04177500", "04183500", "04185000", "04185500", "04186500", "04187500", 
"04188000", "04189000", "04189500", "0419", "04191500", "04192500", 
"04193500", "06191500", "06214500", "06218500", "06222000", "06225500", 
"06228000", "06235500", "06259500", "06262000", "06264000", "06266000", 
"06269500", "06273000", "06276500", "06277500", "06279500", "06287000", 
"06288500", "06288500", "06289000", "06290500", "06293500", "06294700", 
"06329500", "0631", "06309500", "06312500", "06311000", "06311500", 
"06313000", "06313500", "06315500", "06314000", "06315000", "06316500", 
"06317000", "06317500", "06318500", "06319500", "0632", "06320500", 
"06323000", "06323500", "06324000", "06325500", "06324500", "06326500", 
"06426500", "06428000", "06428500", "06436000", "06437000", "06438000", 
"06821500", "0683", "06850500", "06856600", "0686", "06862500", 
"06864000", "06864500", "06865500", "06866000", "06877600", "06879500", 
"06887500", "06889000", "06891000", "06892500", "06342500", "0644", 
"06818000", "06893000", "06934500", "05587500", "0701", "07032000", 
"07289000")), .Names = "site_no", class = "data.frame", row.names = c(NA, 
-112L)) 

pat <- paste("http://www.";, siter, "..rdb", sep="", collapse = "|") 
str(pat) 
 chr "www.c(\"02437100\", \"02439500\", \"02441500\", \"02446500\", 
\"02467000\""| __truncated__ 


OK, the problem is with pat. I need for pat to be the same 
as patter. I have a list of sites in .csv files that I need to process 
so I would like a more efficient way of doing the process than is shown 
below. 

Is there a way to get the results in pat to resemble those in patter? 

sites3 <- c("07103990", "402114105350101", "05056215") 
patter <- paste("www.", sites3, "..rdb", sep="", collapse = "|") 
dput(patter) 
"www.07103990..rdb|www.402114105350101..rdb|www.05056215..rdb" 


Thank you. 

Irucka Embry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Deleting specific rows from a dataframe

2013-07-15 Thread arun
Hi,
If I understand it correctly,
df1<- read.table(text="
sample1 sample2 sample3 sample4 sample5  
 a P P I P P
 b P A P P A
 c P P P P P
 d P P P P P
 e M P M A P
 f P P P P P
 g P P P A P
 h P P P P P
",sep="",header=TRUE,stringsAsFactors=FALSE)
df1[rowSums(df1=="P")==ncol(df1),]
#  sample1 sample2 sample3 sample4 sample5
#c   P   P   P   P   P
#d   P   P   P   P   P
#f   P   P   P   P   P
#h   P   P   P   P   P
A.K.



- Original Message -
From: Chirag Gupta 
To: r-help@r-project.org
Cc: 
Sent: Monday, July 15, 2013 9:10 PM
Subject: [R] Deleting specific rows from a dataframe

I have a data frame like shown below

  sample1 sample2 sample3 sample4 sample5  a P P I P P  b P A P P A  c P P P
P P  d P P P P P  e M P M A P  f P P P P P  g P P P A P  h P P P P P

I want to keep only those rows which have all "P" across all the columns.

Since the matrix is large (about 20,000 rows), I cannot do it in excel

Any special function that i can use?
-- 
*Chirag Gupta*

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Deleting specific rows from a dataframe

2013-07-15 Thread arun
You mentioned data.frame at one place and matrix at another.  Matrix would be 
faster.

#Speed comparison
#Speed
set.seed(1454)
dfTest<- as.data.frame(matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5))

system.time(res<-dfTest[rowSums(dfTest=="P")==ncol(dfTest),])
#   user  system elapsed 
#  0.628   0.020   0.649 
 dim(res)
#[1] 952   5


set.seed(1454)
mat1<- matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5)
system.time(res1<-mat1[rowSums(mat1=="P")==ncol(mat1),])
# user  system elapsed 
#  0.188   0.004   0.194 
dim(res1)
#[1] 952   5

#Other options include
system.time(res3<- dfTest[apply(sweep(dfTest,1,"P","=="),1,all),])
#   user  system elapsed 
#  5.988   0.120   6.120 
 identical(res,res3)
#[1] TRUE



system.time(res2<- dfTest[apply(dfTest,1, function(x) 
all(length(table(x))==ncol(dfTest) | names(table(x))=="P")  ), ])
#   user  system elapsed 
#351.492   0.040 352.164 
row.names(res2)<- row.names(res3)
attr(res3,"row.names")<- attr(res2,"row.names")
 identical(res2,res3)
#[1] TRUE


A.K.

- Original Message -
From: arun 
To: Chirag Gupta 
Cc: R help 
Sent: Monday, July 15, 2013 9:23 PM
Subject: Re: [R] Deleting specific rows from a dataframe

Hi,
If I understand it correctly,
df1<- read.table(text="
sample1 sample2 sample3 sample4 sample5  
 a P P I P P
 b P A P P A
 c P P P P P
 d P P P P P
 e M P M A P
 f P P P P P
 g P P P A P
 h P P P P P
",sep="",header=TRUE,stringsAsFactors=FALSE)
df1[rowSums(df1=="P")==ncol(df1),]
#  sample1 sample2 sample3 sample4 sample5
#c   P   P   P   P   P
#d   P   P   P   P   P
#f   P   P   P   P   P
#h   P   P   P   P   P
A.K.



- Original Message -
From: Chirag Gupta 
To: r-help@r-project.org
Cc: 
Sent: Monday, July 15, 2013 9:10 PM
Subject: [R] Deleting specific rows from a dataframe

I have a data frame like shown below

  sample1 sample2 sample3 sample4 sample5  a P P I P P  b P A P P A  c P P P
P P  d P P P P P  e M P M A P  f P P P P P  g P P P A P  h P P P P P

I want to keep only those rows which have all "P" across all the columns.

Since the matrix is large (about 20,000 rows), I cannot do it in excel

Any special function that i can use?
-- 
*Chirag Gupta*

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (1 - 0.7) == 0.3

2013-07-16 Thread arun
HI,
2-0.7==0.3
#[1] FALSE
##May be u meant
 2-0.7==1.3
#[1] TRUE


Possibly R FAQ 7.31
Also, check
http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy

all.equal(2-0.7,1.3)
#[1] TRUE
 all.equal(1-0.7,0.3)
#[1] TRUE
(1-0.7)<(0.3+.Machine$double.eps^0.5)
#[1] TRUE



 p <- c(0.2, 0.4, 0.6, 0.8, 1) 
round((1-p)*5,1)+1
#[1] 5 4 3 2 1


In your second example,
 p <- c(0.8, 0.6, 0.4, 0.2, 0)
floor((1 - p) * 5) + 1 
#[1] 1 3 4 5 6  
((1-0.8)*5) +1
#[1] 2


 round((1-p)*5,1)+1
#[1] 2 3 4 5 6

A.K.

...is false :( However (2 - 0.7) == 0.3 is true. 

Is there any way to get around this?   

The end goal is for this to work: 

p <- c(0.2, 0.4, 0.6, 0.8, 1) 
floor((1 - p) * 5) + 1 

>  5 4 3 1 1 

whereas the correct result would have been 5 4 3 2 1. If I set p <- c(0.8, 0.6, 
0.4, 0.2, 0) then it works as expected.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Errors using large numbers ((i) all entries of 'x' must be nonnegative and finite and (ii) NAs introduced by coercion)

2013-07-16 Thread arun


HI,
?as.integer() #documentation
Note that current implementations of R use 32-bit integers for
 integer vectors, so the range of representable integers is
 restricted to about +/-2*10^9: ‘double’s can hold much larger
 integers exactly.
as.numeric(c(75533, 4756922556, 88210, 6715122129))
#[1]  75533 4756922556  88210 6715122129
#or
 as.double(c(75533, 4756922556, 88210, 6715122129))
#[1]  75533 4756922556  88210 6715122129
A.K.





- Original Message -
From: PIKAL Petr 
To: jgibbons1 ; "r-help@r-project.org" 

Cc: 
Sent: Tuesday, July 16, 2013 12:54 PM
Subject: Re: [R] Errors using large numbers ((i) all entries of 'x' must be 
nonnegative and finite and (ii) NAs introduced by coercion)

Well,

You could find it yourself,

as.integer(c(75533, 4756922556, 88210, 6715122129))
[1] 75533    NA 88210    NA
Warning message:
NAs introduced by coercion 
> matrix(c(75533, 4756922556, 88210, 6715122129), nrow=2)
           [,1]       [,2]
[1,]      75533      88210
[2,] 4756922556 6715122129

Using as.integer inputs NA as integer type has limited size.

Petr


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of jgibbons1
> Sent: Tuesday, July 16, 2013 4:44 PM
> To: r-help@r-project.org
> Subject: [R] Errors using large numbers ((i) all entries of 'x' must be
> nonnegative and finite and (ii) NAs introduced by coercion)
> 
> Hello,
> I am fairly new to R, so please forgive me if this is a fairly easy
> solution.
> 
> I am trying to perform multiple Fisher's Exact tests or Pearson's Chi-
> squared contingency tests from a datamatrix in which data from each row
> is data for an independent test.
> 
> My data is formatted as such:
> 
> AAA 75533 4756922556 88210 6715122129
> BBB 14869 4756983220 16384 6715193955
> CCC  7230 4756990859  8559 6715201780
> DDD 18332 4756979757 23336 6715187003
> EEE 14733 4756983356 16826 6715193513
> FFF  2918 4756995171  3433 6715206906
> GGG  3726 4756994363  4038 6715206301
> HHH  6196 4756991893  7011 6715203328
> III  7925 4756990164  9130 6715201209
> JJJ  1434 4756996655  1602 6715208737
> 
> Where the 1st column is the identifier, the 2nd column = observations
> 1, the 3rd column = background counts 1, the 4th column = observations
> 2 and the 5th column = background counts 2.
> 
> I am loading my data as such:
> 
>      > data=read.table("My.File", header=FALSE)
> 
> And I am looping through each row to perform a test like this:
> 
>      > pvalues=c("pvalue")
>      > for(i in 1:10){
>      + datamatrix=matrix(c(as.integer(data[i,2:5])),nrow=2)
>      + fisherresult=fisher.test(datamatrix)
>      + pvalues=cbind(pvalues,fisherresult[1])
>      + }
> 
> Here is the Error I am Getting:
> 
> Error in fisher.test(datamatrix) :
>   all entries of 'x' must be nonnegative and finite In addition:
> Warning messages:
> 1: In matrix(c(as.integer(data[i, 2:5])), nrow = 2) :
>   NAs introduced by coercion
> 2: In matrix(c(as.integer(data[i, 2:5])), nrow = 2) :
>   NAs introduced by coercion
> 
> 
> When I replace the large number in the 3rd and 5th column with smaller
> numbers, the statistical calculation works fine.
> 
> Any ideas? Any help would be GREATLY appreciated!
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Errors-
> using-large-numbers-i-all-entries-of-x-must-be-nonnegative-and-finite-
> and-ii-NAs-introduced-b-tp4671685.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove attributes from scale() in a matrix?

2013-07-16 Thread arun
HI,
Try:
x1<-scale(x,center=TRUE,scale=TRUE)
str(x1)
# num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ...
# - attr(*, "scaled:center")= num [1:10] 50.2 50 49.8 49.8 50.3 ...
 #- attr(*, "scaled:scale")= num [1:10] 1.109 0.956 0.817 0.746 1.019 ...

 attr(x1,"scaled:center")<-NULL
 attr(x1,"scaled:scale")<-NULL
str(x1)
 #num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ...
A.K.




- Original Message -
From: C W 
To: r-help 
Cc: 
Sent: Tuesday, July 16, 2013 3:59 PM
Subject: [R] How to remove attributes from scale() in a matrix?

Hi list,

I am using scale() to standardize a distribution?  But why does it
give me attributes attached to the data?  I just want a standardized
matrix, that is all.

library(mvtnorm)
> x <- rmvnorm(15, mean=rep(50, 10))
> x
          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
  [,8]     [,9]
[1,] 51.17519 52.34341 49.63084 47.99234 51.63113 50.91391 49.36819
49.23901 51.17377
[2,] 50.57039 49.17210 48.64395 49.03940 49.65761 49.93840 49.94883
50.69044 49.57632
[3,] 50.64811 50.21503 50.13786 49.15879 48.51550 50.19444 50.23710
50.98040 51.37032
[4,] 49.22797 49.66445 49.93287 48.63681 50.49457 50.33302 52.29552
49.98424 51.04724
[5,] 49.72099 50.84510 50.60976 49.60883 53.59509 49.14728 50.23134
49.09141 49.23780
[6,] 49.49126 50.90938 49.67140 50.08951 49.79854 49.03711 50.26037
50.24975 48.26958
[7,] 51.12384 47.92778 50.60112 49.01554 49.47515 50.12756 51.65216
49.21998 49.63808
[8,] 51.45123 50.44037 50.01039 50.27511 49.97658 51.63002 50.37156
50.02685 48.95423
[9,] 51.16989 50.16200 51.17724 50.71678 50.79565 50.27128 51.05608
49.61165 47.81732
[10,] 49.54263 49.93501 49.71762 49.33378 51.44935 51.53775 50.54346
49.98333 49.59422
[11,] 51.16497 49.82914 49.08821 51.02918 49.67663 49.53498 50.26647
49.48569 50.94504
[12,] 51.16827 50.50244 49.13003 49.00155 50.26457 48.85465 49.11593
50.58031 51.14926
[13,] 48.26216 49.94866 48.62526 49.11995 50.40082 49.25359 48.57677
50.66760 49.44108
[14,] 49.82530 49.17352 50.05588 50.51265 51.04926 50.32474 49.78180
50.48349 49.92431
[15,] 50.55772 49.84691 47.95021 50.24911 49.85335 50.73062 51.48718
51.36693 50.18307
         [,10]
[1,] 50.13859
[2,] 51.54920
[3,] 49.23230
[4,] 50.92683
[5,] 50.97708
[6,] 50.78799
[7,] 50.53913
[8,] 49.30832
[9,] 49.43606
[10,] 49.42060
[11,] 50.21002
[12,] 51.94848
[13,] 49.41352
[14,] 52.24064
[15,] 51.19474
> scale(x, center=TRUE, scale=TRUE)
            [,1]       [,2]         [,3]        [,4]        [,5]
[,6]        [,7]
[1,]  0.8890317  2.3390090 -0.040395734 -1.86089754  1.00159470
0.92533476 -0.99715965
[2,]  0.2452502 -0.9109703 -1.190404546 -0.63771097 -0.66104313
-0.21446975 -0.40514793
[3,]  0.3279747  0.1578297  0.550427419 -0.49823662 -1.62323564
0.08468695 -0.11121941
[4,] -1.1837031 -0.4064112  0.311551281 -1.10802250  0.04407804
0.24660932  1.98754311
[5,] -0.6589074  0.8035298  1.100314901  0.02749734  2.65618150
-1.13883336 -0.11709623
[6,] -0.9034419  0.8694088  0.006865424  0.58904255 -0.54231158
-1.26755243 -0.08749646
[7,]  0.8343705 -2.1861602  1.090250934 -0.66558751 -0.81476108
0.00655050  1.33157578
[8,]  1.1828615  0.3887665  0.401888014  0.80585326 -0.39231482
1.76205433  0.02587038
[9,]  0.8833860  0.1034854  1.761589113  1.32181395  0.29773018
0.17447101  0.72381232
[10,] -0.8487569 -0.1291363  0.060728488 -0.29381647  0.84844800
1.65423826  0.20113824
[11,]  0.8781560 -0.2376361 -0.672712386  1.68676651 -0.64501776
-0.68583461 -0.08127647
[12,]  0.8816611  0.4523675 -0.623990804 -0.68193230 -0.14968994
-1.48074503 -1.25436403
[13,] -2.2117715 -0.1151423 -1.212190165 -0.54361597 -0.03490809
-1.01461386 -1.80409304
[14,] -0.5478718 -0.9095198  0.454889973  1.08335795  0.51138447
0.23693367 -0.57544865
[15,]  0.2317608 -0.2194204 -1.998811911  0.77548830 -0.49613484
0.71117025  1.16336204
            [,8]        [,9]       [,10]
[1,] -1.2666791  1.17934107 -0.35061189
[2,]  0.8423439 -0.28600703  1.06389274
[3,]  1.2636749  1.35963155 -1.25940610
[4,] -0.1838111  1.06327078  0.43980595
[5,] -1.4811512 -0.59653247  0.49019839
[6,]  0.2019965 -1.48467432  0.30058779
[7,] -1.2943281 -0.22935616  0.05103467
[8,] -0.1218950 -0.85664924 -1.18316969
[9,] -0.7252082 -1.89953387 -1.05507780
[10,] -0.1851315 -0.26958168 -1.07058564
[11,] -0.9082374  0.96952049 -0.27897948
[12,]  0.6823136  1.15685832  1.46428187
[13,]  0.8091562 -0.41005981 -1.07768018
[14,]  0.5416286  0.03320871  1.75724879
[15,]  1.8253278  0.27056365  0.70846058
attr(,"scaled:center")
[1] 50.33999 50.06102 49.66551 49.58529 50.44225 50.12196 50.34618
50.11074 49.88811
[10] 50.48823
attr(,"scaled:scale")
[1] 0.9394453 0.9757930 0.8581604 0.8560117 1.1869812 0.8558562
0.9807762 0.6882016
[9] 1.0901550 0.9972455


Also,
> attributes(x) <- NULL
will not work since this is matrix not vector.

Thanks,
Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-pr

Re: [R] How to remove attributes from scale() in a matrix?

2013-07-16 Thread arun
Hi Mike,
If you check ?scale

For ‘scale.default’, the centered, scaled matrix.  The numeric
 centering and scalings used (if any) are returned as attributes
 ‘"scaled:center"’ and ‘"scaled:scale"’

By checking the source code:
methods(scale)

getAnywhere('scale.default')

function (x, center = TRUE, scale = TRUE) 
{
    x <- as.matrix(x)
    nc <- ncol(x)
    if (is.logical(center)) {
    if (center) {
    center <- colMeans(x, na.rm = TRUE)
    x <- sweep(x, 2L, center, check.margin = FALSE)
    }
    }
    else if (is.numeric(center) && (length(center) == nc)) 
    x <- sweep(x, 2L, center, check.margin = FALSE)
    else stop("length of 'center' must equal the number of columns of 'x'")
    if (is.logical(scale)) {
    if (scale) {
    f <- function(v) {
    v <- v[!is.na(v)]
    sqrt(sum(v^2)/max(1, length(v) - 1L))
    }
    scale <- apply(x, 2L, f)
    x <- sweep(x, 2L, scale, "/", check.margin = FALSE)
    }
    }
    else if (is.numeric(scale) && length(scale) == nc) 
    x <- sweep(x, 2L, scale, "/", check.margin = FALSE)
    else stop("length of 'scale' must equal the number of columns of 'x'")
    if (is.numeric(center)) 
    attr(x, "scaled:center") <- center
    if (is.numeric(scale)) 
    attr(x, "scaled:scale") <- scale
    x
}

#You can comment out the last few lines:

scale1<- function (x, center = TRUE, scale = TRUE) 
{
    x <- as.matrix(x)
    nc <- ncol(x)
    if (is.logical(center)) {
    if (center) {
    center <- colMeans(x, na.rm = TRUE)
    x <- sweep(x, 2L, center, check.margin = FALSE)
    }
    }
    else if (is.numeric(center) && (length(center) == nc)) 
    x <- sweep(x, 2L, center, check.margin = FALSE)
    else stop("length of 'center' must equal the number of columns of 'x'")
    if (is.logical(scale)) {
    if (scale) {
    f <- function(v) {
    v <- v[!is.na(v)]
    sqrt(sum(v^2)/max(1, length(v) - 1L))
    }
    scale <- apply(x, 2L, f)
    x <- sweep(x, 2L, scale, "/", check.margin = FALSE)
    }
    }
    else if (is.numeric(scale) && length(scale) == nc) 
    x <- sweep(x, 2L, scale, "/", check.margin = FALSE)
    else stop("length of 'scale' must equal the number of columns of 'x'")
    #if (is.numeric(center)) 
    #    attr(x, "scaled:center") <- center
    #if (is.numeric(scale)) 
    #    attr(x, "scaled:scale") <- scale
    x
}
 x2<-scale1(x,center=TRUE,scale=TRUE)
 str(x2)
# num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ...

identical(x1,x2)
#[1] TRUE
A.K.



- Original Message -
From: C W 
To: arun 
Cc: R help 
Sent: Tuesday, July 16, 2013 6:58 PM
Subject: Re: [R] How to remove attributes from scale() in a matrix?

Arun, thanks for the quick response.  That helps.

Why does scale() give attributes?  What's the point of that?  I don't
see apply() or any similar functions do it.  Just for my curiosity.

Mike

On Tue, Jul 16, 2013 at 4:07 PM, arun  wrote:
> HI,
> Try:
> x1<-scale(x,center=TRUE,scale=TRUE)
> str(x1)
> # num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ...
> # - attr(*, "scaled:center")= num [1:10] 50.2 50 49.8 49.8 50.3 ...
>  #- attr(*, "scaled:scale")= num [1:10] 1.109 0.956 0.817 0.746 1.019 ...
>
>  attr(x1,"scaled:center")<-NULL
>  attr(x1,"scaled:scale")<-NULL
> str(x1)
>  #num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ...
> A.K.
>
>
>
>
> - Original Message -
> From: C W 
> To: r-help 
> Cc:
> Sent: Tuesday, July 16, 2013 3:59 PM
> Subject: [R] How to remove attributes from scale() in a matrix?
>
> Hi list,
>
> I am using scale() to standardize a distribution?  But why does it
> give me attributes attached to the data?  I just want a standardized
> matrix, that is all.
>
> library(mvtnorm)
>> x <- rmvnorm(15, mean=rep(50, 10))
>> x
>           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]
>   [,8]     [,9]
> [1,] 51.17519 52.34341 49.63084 47.99234 51.63113 50.91391 49.36819
> 49.23901 51.17377
> [2,] 50.57039 49.17210 48.64395 49.03940 49.65761 49.93840 49.94883
> 50.69044 49.57632
> [3,] 50.64811 50.21503 50.13786 49.15879 48.51550 50.19444 50.23710
> 50.98040 51.37032
> [4,] 49.22797 49.66445 49.93287 48.63681 50.49457 50.33302 52.29552
> 49.98424 51.04724
> [5,] 49.72099 50.84510 50.60976 49.60883 53.59509 49.14728 50.23134
> 49.0914

Re: [R] writing multiple lines to a file

2013-07-16 Thread arun
HI,
May be this helps:
printer1<- file("out1.txt","w")
write(sprintf("This is line %d.\n",1),printer1,append=TRUE) 
write("This is line 2",printer1,append=TRUE)
close(printer1)

#or


 printer1<- file("out1.txt","w")
writeLines("This is line",con=printer1,sep="\n")
writeLines("This is line 2",con=printer1)
 close(printer1)
A.K.


Hello, I am trying to wrote multiple lines to a file, but I only seem to be 
able to write the last line. 

printer = file("out.txt") 
write(sprintf("This is line %d.\n",1),printer,append=T) 
write("This is line 2.",printer,append=T) 
close(printer) 

How can I fix this? I would like to be able to do this in a for-loop with 
hundreds of elements.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting dataframes and cleaning extraneous characters

2013-07-17 Thread arun
Hi,
YOu could try.
?split()
split(ats,ats$Project_NBR)
You also mentioned about two columns.

split(ats,list(ats$col1, ats$col2))

You should have provided an example dataset using ?dput() ( dput(head(data,10)) 
) for testing.
Also,

gsub("^-[^-]*-","","-005-190")
#[1] "190"
A.K.




Problem: I have a large data set and need to separate based on factors 
in 2 columns. The final output would be a collection of dataframes 
renamed to 

the corresponding factor levels.   

So far I know that for each corresponding factor I can execute 

x190<-ats[which(Project_NBR=='-005-190'),] 

However there are about 400 factors needing to be separated. 
Also, I would like to remove the "-005-".  Any guidance will be greatly 
appreciated.  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] writing multiple lines to a file

2013-07-17 Thread arun
Hi,
No problem.

You could try:

printer = file("out.txt","w")
 writeLines("This is line.",con=printer,sep=" ")
 writeLines("The same line.",con=printer)
 close(printer)

#or
cat(sprintf("This is line %d. ",1),file="out.txt",append=TRUE)
cat("The same line.",file="out.txt",append=TRUE)

A.K.

Thank you very much, I just have one more simple question. It worked 
with writing "w" when opening the file. However another problem 
occoured, When I wrote \n, it went two lines down, so I had to do this, 
witout \n 

printer = file("out.txt","w") 
write(sprintf("This is line %d.",1),printer,append=T) 
write("This is line 2.",printer,append=T) 
close(printer) 


However, sometimes, I do not want to start on the new line, 
it depends on the situation. That is I may write something to a file. 
And then I want to add to the same line a new string: 
" The same line." Like this. 

printer = file("out.txt","w") 
write(sprintf("This is line %d.",1),printer,append=T) 
write(" The same line.",printer,append=T) 
close(printer) 

But the output is: 
This is line 1. 
 The same line. 


How can I make it stop going to the new line automatically. 


- Original Message -
From: arun 
To: R help 
Cc: 
Sent: Tuesday, July 16, 2013 10:53 PM
Subject: Re: writing multiple lines to a file

HI,
May be this helps:
printer1<- file("out1.txt","w")
write(sprintf("This is line %d.\n",1),printer1,append=TRUE) 
write("This is line 2",printer1,append=TRUE)
close(printer1)

#or


 printer1<- file("out1.txt","w")
writeLines("This is line",con=printer1,sep="\n")
writeLines("This is line 2",con=printer1)
 close(printer1)
A.K.


Hello, I am trying to wrote multiple lines to a file, but I only seem to be 
able to write the last line. 

printer = file("out.txt") 
write(sprintf("This is line %d.\n",1),printer,append=T) 
write("This is line 2.",printer,append=T) 
close(printer) 

How can I fix this? I would like to be able to do this in a for-loop with 
hundreds of elements.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting dataframes and cleaning extraneous characters

2013-07-17 Thread arun
HI,
One problem with using ?subst() would be it depends upon the number of digits, 
characters etc.  

For eg.
substring("-005-190",6)
#[1] "190"
 substring("-0057-190",6)
#[1] "-190"

#whereas

gsub("^-[^-]*-","","-0057-190")
#[1] "190"

Probably, your dataset doesn't have that sort of problem.

dat1<- read.table(text="
project boro
123 m
134 k
123 m
123 m
543 q
543 q
134 k
",sep="",header=TRUE,stringsAsFactors=FALSE)
 res<-split(dat1,gsub("\\.","",as.character(interaction(dat1[,2],dat1[,1]
 res
$k134
#  project boro
#2 134    k
#7 134    k
#
#$m123
 # project boro
#1 123    m
#3 123    m
#4 123    m
#
#$q543
 # project boro
#5 543    q
#6 543    q
 str(res$k134)
#'data.frame':    2 obs. of  2 variables:
# $ project: int  134 134
# $ boro   : chr  "k" "k"
A.K.



I was able to split the extraneous stuff using 

a<-substring(Project_NBR, first=6) 

and then cbind to add the edited column to the df. I have a 
sample but I am not sure how to provide it to you. I will try to produce
 an example that's similar to what I have: 

project boro 
123 m 
134 k 
123 m 
123 m 
543 q 
543 q 
134 k 


Basically I am trying to subset the data frame according to 
project and boro with the name of the subset being boro-project (ex. 
m123, k134) 

I hope this provides more clarity to my problem. 


- Original Message -
From: arun 
To: R help 
Cc: 
Sent: Wednesday, July 17, 2013 11:06 AM
Subject: Re: Splitting dataframes and cleaning extraneous characters

Hi,
YOu could try.
?split()
split(ats,ats$Project_NBR)
You also mentioned about two columns.

split(ats,list(ats$col1, ats$col2))

You should have provided an example dataset using ?dput() ( dput(head(data,10)) 
) for testing.
Also,

gsub("^-[^-]*-","","-005-190")
#[1] "190"
A.K.




Problem: I have a large data set and need to separate based on factors 
in 2 columns. The final output would be a collection of dataframes 
renamed to 

the corresponding factor levels.   

So far I know that for each corresponding factor I can execute 

x190<-ats[which(Project_NBR=='-005-190'),] 

However there are about 400 factors needing to be separated. 
Also, I would like to remove the "-005-".  Any guidance will be greatly 
appreciated.  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simplify a dataframe

2013-07-17 Thread arun
Hi,
You could try:

df1[,1:2]<-lapply(df1[,1:2],as.character)
 df2New<- data.frame(Deb=unique(with(df1,ave(Debut,INDX,FUN=function(x) 
head(x,1,Fin=unique(with(df1,ave(Fin,INDX,FUN=function(x) tail(x,1)
identical(df2New,df2)
#[1] TRUE

A.K.


- Original Message -
From: Arnaud Michel 
To: Rui Barradas ; R help ; arun 

Cc: 
Sent: Wednesday, July 17, 2013 4:03 PM
Subject: Re: [R] simplify a dataframe

  Thank you for the question (1)
Sorry for the imprecision for the question (2) :
Suppose the date frame df
df1 <- data.frame(
Debut =c ( "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" 
,"28/02/1995"
,"01/03/1995", "13/03/1995", "01/01/1996", "31/01/1996") ,
Fin = c ( "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", 
"28/02/1995",
"12/03/1995", "30/06/1995", "30/01/1996", "31/01/1996") ,
INDX = c(6,6,6,  11,11,11, 4,  5,5) )


I would like replace df1  by df2

df2 <- data.frame(
Deb  = c("24/01/1995",     "02/02/1995",     "13/03/1995",
"01/01/1996") ,
Fin  = c("31/12/1997", "12/03/1995",     "30/06/1995",
"31/01/1996") )

Explication :
The lines 1, 2 3 of df1 (who have same value of index =6) are replaced 
by only one line with
value of Debut of df2 = Debut of line 1 of df1
value of Fin of df2 = Fin of line 3 of df1

The lines 4,5,6 of df1 (who have same value of index =11) are replaced 
by only one line with
value of Debut of df2 = Debut of line 4 of df1
and value of fin of df2 = Fin of line 6 of df1

The line 7 of df1 (who have same value of index =4) are replaced by only 
one line with
value of Debut of df2 = Debut of line 7of df1
and value of fin of df2 = Fin of line 7of df1
==> No change

The lines 8,9 of df1 (who have same value of index =5) are replaced by 
only one line with
value of Debut of df2 = Debut of line 8of df1
and value of fin of df2 = Fin of line 9 of df1

df1
        Debut        Fin INDX
1 24/01/1995 30/04/1997    6
2 01/05/1997 30/12/1997    6
3 31/12/1997 31/12/1997    6
4 02/02/1995 27/02/1995   11
5 28/02/1995 28/02/1995   11
6 01/03/1995 12/03/1995   11
7 13/03/1995 30/06/1995    4
8 01/01/1996 30/01/1996    5
9 31/01/1996 31/01/1996    5

          Deb        Fin
1 24/01/1995 31/12/1997
2 02/02/1995 12/03/1995
3 13/03/1995 30/06/1995
4 01/01/1996 31/01/1996
Thank you for your helps
Michel

Le 17/07/2013 19:57, Rui Barradas a écrit :
> Hello,
>
> As for question (1), try the following.
>
>
> y2 <- cumsum(c(TRUE, diff(x1) > 0))
> identical(as.integer(y1), y2)  # y1 is of class "numeric"
>
>
> As for question (2) I'm not understanding it.
>
> Hope this helps,
>
> Rui Barradas
>
> Em 17-07-2013 18:21, Arnaud Michel escreveu:
>> Hi Arun
>>
>> I have two questions always about the question of symplify a dataframe
>>
>> I would like
>> 1)  to transform the vector x1 into the vector y1
>> x1 <- c(1,1,1,-1000,         1,-1000, 1,1,1,1,1,1,-1000)
>> y1 <- c(1,1,1,1,                    2,2, 3,3,3,3,3,3,3)
>>
>>
>> 2) to transform the vectors Debut and Fin by taking into account INDX
>> into the two vectors Deb and Fin
>> Debut <- c (
>> "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" ,"28/02/1995"
>> ,"01/03/1995",
>> "13/03/1995", "01/01/1996", "31/01/1996", "24/01/1995", "01/07/1995"
>> ,"01/09/1995",
>>    "01/07/1997", "01/01/1998", "01/08/1998", "01/01/2000",
>> "17/01/2000","29/02/2000")
>>
>> Fin <- c (
>> "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", "28/02/1995",
>> "12/03/1995",
>> "30/06/1995", "30/01/1996", "31/01/1996", "30/06/1995", "31/08/1995",
>> "30/06/1997",
>> "31/12/1997", "31/07/1998", "31/12/1999", "16/01/2000", "28/02/2000",
>> "29/02/2000")
>>
>> INDX <- c(6,6,6,                    11,11,11, 4,        5,5)
>>
>>
>> Deb  <- c("*24/01/1995*",     "*02/02/1995*", "*13/03/1995*",
>> "*01/01/1996*")
>> Fi n  <-  c("*31/12/1997*", "*12/03/1995*", "*30/06/1995*",
>> "*31/01/1996*")
>>
>>
>>        Debut        Fin INDX
>> *24/01/1995* 30/04/1997    6
>> 01/05/1997 30/12/1997    6
>> 31/12/1997 *31/12/1997*    6
>> *02/02/1995* 27/02/1995   11
>> 28/02/1995 28/02/1995   11
>> 01/03/1995 *12/03/1995*   11
>> *13/03/1995* *30/06/1995*    4
>> *01/01/1996* 30/01/1996    5
>> 31/01/1996 *31/01/1996*    5
>> 
>>
>> Thanks for your help
>>
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

-- 
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simplify a dataframe

2013-07-17 Thread arun
#or
library(plyr)
res<-ddply(df1,.(INDX),summarize,Debut=head(Debut,1),Fin=tail(Fin,1))
res$INDX<-factor(res$INDX,levels=unique(df1$INDX))
res[order(res$INDX),-1]
#   Debut    Fin
#3 24/01/1995 31/12/1997
#4 02/02/1995 12/03/1995
#1 13/03/1995 30/06/1995
#2 01/01/1996 31/01/1996
A.K.



- Original Message -
From: arun 
To: Arnaud Michel 
Cc: R help ; Rui Barradas 
Sent: Wednesday, July 17, 2013 4:14 PM
Subject: Re: [R] simplify a dataframe

Hi,
You could try:

df1[,1:2]<-lapply(df1[,1:2],as.character)
 df2New<- data.frame(Deb=unique(with(df1,ave(Debut,INDX,FUN=function(x) 
head(x,1,Fin=unique(with(df1,ave(Fin,INDX,FUN=function(x) tail(x,1)
identical(df2New,df2)
#[1] TRUE

A.K.


- Original Message -
From: Arnaud Michel 
To: Rui Barradas ; R help ; arun 

Cc: 
Sent: Wednesday, July 17, 2013 4:03 PM
Subject: Re: [R] simplify a dataframe

  Thank you for the question (1)
Sorry for the imprecision for the question (2) :
Suppose the date frame df
df1 <- data.frame(
Debut =c ( "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" 
,"28/02/1995"
,"01/03/1995", "13/03/1995", "01/01/1996", "31/01/1996") ,
Fin = c ( "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", 
"28/02/1995",
"12/03/1995", "30/06/1995", "30/01/1996", "31/01/1996") ,
INDX = c(6,6,6,  11,11,11, 4,  5,5) )


I would like replace df1  by df2

df2 <- data.frame(
Deb  = c("24/01/1995",     "02/02/1995",     "13/03/1995",
"01/01/1996") ,
Fin  = c("31/12/1997", "12/03/1995",     "30/06/1995",
"31/01/1996") )

Explication :
The lines 1, 2 3 of df1 (who have same value of index =6) are replaced 
by only one line with
value of Debut of df2 = Debut of line 1 of df1
value of Fin of df2 = Fin of line 3 of df1

The lines 4,5,6 of df1 (who have same value of index =11) are replaced 
by only one line with
value of Debut of df2 = Debut of line 4 of df1
and value of fin of df2 = Fin of line 6 of df1

The line 7 of df1 (who have same value of index =4) are replaced by only 
one line with
value of Debut of df2 = Debut of line 7of df1
and value of fin of df2 = Fin of line 7of df1
==> No change

The lines 8,9 of df1 (who have same value of index =5) are replaced by 
only one line with
value of Debut of df2 = Debut of line 8of df1
and value of fin of df2 = Fin of line 9 of df1

df1
        Debut        Fin INDX
1 24/01/1995 30/04/1997    6
2 01/05/1997 30/12/1997    6
3 31/12/1997 31/12/1997    6
4 02/02/1995 27/02/1995   11
5 28/02/1995 28/02/1995   11
6 01/03/1995 12/03/1995   11
7 13/03/1995 30/06/1995    4
8 01/01/1996 30/01/1996    5
9 31/01/1996 31/01/1996    5

          Deb        Fin
1 24/01/1995 31/12/1997
2 02/02/1995 12/03/1995
3 13/03/1995 30/06/1995
4 01/01/1996 31/01/1996
Thank you for your helps
Michel

Le 17/07/2013 19:57, Rui Barradas a écrit :
> Hello,
>
> As for question (1), try the following.
>
>
> y2 <- cumsum(c(TRUE, diff(x1) > 0))
> identical(as.integer(y1), y2)  # y1 is of class "numeric"
>
>
> As for question (2) I'm not understanding it.
>
> Hope this helps,
>
> Rui Barradas
>
> Em 17-07-2013 18:21, Arnaud Michel escreveu:
>> Hi Arun
>>
>> I have two questions always about the question of symplify a dataframe
>>
>> I would like
>> 1)  to transform the vector x1 into the vector y1
>> x1 <- c(1,1,1,-1000,         1,-1000, 1,1,1,1,1,1,-1000)
>> y1 <- c(1,1,1,1,                    2,2, 3,3,3,3,3,3,3)
>>
>>
>> 2) to transform the vectors Debut and Fin by taking into account INDX
>> into the two vectors Deb and Fin
>> Debut <- c (
>> "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" ,"28/02/1995"
>> ,"01/03/1995",
>> "13/03/1995", "01/01/1996", "31/01/1996", "24/01/1995", "01/07/1995"
>> ,"01/09/1995",
>>    "01/07/1997", "01/01/1998", "01/08/1998", "01/01/2000",
>> "17/01/2000","29/02/2000")
>>
>> Fin <- c (
>> "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", "28/02/1995",
>> "12/03/1995",
>> "30/06/1995", "30/01/1996", "31/01/1996", "30/06/1995", "31/08/1995",
>> "30/06/1997",
>> "31/12/1997", "31/07/1998", "31/12/1999", "16/01/2000", "28/02/2000",
>> "29/02/2000")
>>
>> INDX <- c(6,6,6,                    

Re: [R] cut into groups of equal nr of elements...

2013-07-17 Thread arun
HI,
Not sure whether this is what you wanted.


 vec1<- 1:7
 fun1<- function(x,nr) {((x-1)%/%nr)+1}
 fun1(vec1,2)
#[1] 1 1 2 2 3 3 4
 fun1(vec1,3)
#[1] 1 1 1 2 2 2 3
split(vec1,fun1(vec1,2))

A.K.



- Original Message -
From: Witold E Wolski 
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 17, 2013 5:43 PM
Subject: [R] cut into groups of equal nr of elements...

I would like to "cut" a vector into groups of equal nr of elements.
looking for a function on the lines of cut but where I can specify
the size of the groups instead of the nr of groups.




--
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut into groups of equal nr of elements...

2013-07-17 Thread arun
Sorry, there was a mistake:
fun1 should be:
fun1<- function(x,nr) {((seq_along(x)-1)%/%nr)+1}

vec3<- c(4,5,7,9,8,5)
 fun1(vec3,2)
#[1] 1 1 2 2 3 3

split(vec3,fun1(vec3,2))


A.K.



- Original Message -
From: arun 
To: Witold E Wolski 
Cc: R help 
Sent: Wednesday, July 17, 2013 6:04 PM
Subject: Re: [R] cut into groups of equal nr of elements...

HI,
Not sure whether this is what you wanted.


 vec1<- 1:7
 fun1<- function(x,nr) {((x-1)%/%nr)+1}
 fun1(vec1,2)
#[1] 1 1 2 2 3 3 4
 fun1(vec1,3)
#[1] 1 1 1 2 2 2 3
split(vec1,fun1(vec1,2))

A.K.



- Original Message -
From: Witold E Wolski 
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 17, 2013 5:43 PM
Subject: [R] cut into groups of equal nr of elements...

I would like to "cut" a vector into groups of equal nr of elements.
looking for a function on the lines of cut but where I can specify
the size of the groups instead of the nr of groups.




--
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine select data from 2 dataframes sharing same variables

2013-07-17 Thread arun
Hi,
Not sure if this is what you wanted:
#If columns are arranged in the same order in both data.frames.

lst1<-lapply(seq_len(ncol(StatsUTAH)),function(i) 
{x1<-cbind(StatsUTAH[,i],sStatsUTAH[,i]);row.names(x1)<-row.names(StatsUTAH);colnames(x1)<-c("zeroNO","zeroYES");x1})
 names(lst1)<- colnames(StatsUTAH)

A.K.



- Original Message -
From: bcrombie 
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 17, 2013 4:12 PM
Subject: [R] combine select data from 2 dataframes sharing same variables

#  The following dataframes are the result of two analyses performed on
the same set of numeric data.
# The first analysis involved calculations that did not include zero values:
StatsUTAH = data.frame(MWtotaleesDue =
c(8.428571,2.496256,7,6.604472,1,17,3.593998,4.834573,12.02257),
                       OTtotaleesDue =
c(6.6,2.242023,3,7.089899,1,23,3.100782,3.499218,9.700782),
                       OTtotalBWsDue =
c(559.944,305.7341,257.55,966.816,15.19,3232.97,422.839,137.105,982.783),
                       TotalBWsFD =
c(693.2973,265.0846,267.58,1026.6682,15.19,3232.97,356.5468,336.7505,1049.8442))
rownames(StatsUTAH)<- c("Mean","StdError", "Median", "StdDev", "Min", "Max",
"NinetyPct", "NinetyPctLower", "NinetyPctUpper")
StatsUTAH

# The second analysis involved calculations that included zero values:
sStatsUTAH = data.frame(MWtotaleesDue =
c(0.9076923,0.411799,0,3.3200295,0,17,0.5332467,0.3744456,1.440939),
                        OTtotaleesDue =
c(1.0153846,0.4442433,0,3.5816036,0,23,0.5752594,0.4401252,1.590644),
                        OTtotalBWsDue =
c(86.14523,51.5752,0,415.81256,0,3232.97,66.78575,19.35948,152.93098),
                        TotalBWsFD =
c(159.99169,69.86036,0,563.23225,0,3232.97,90.46357,69.52812,250.45526))
rownames(sStatsUTAH)<- c("sMean","sStdError", "sMedian", "sStdDev", "sMin",
"sMax", "sNinetyPct", "sNinetyPctLower", "sNinetyPctUpper")
sStatsUTAH

#the rows 1-9 may have different names in each dataframe but are the same
corresponding calculation in both.

#  I need to combine these data so that the OUTPUT is a SEPARATE table
(or matrix or whatever)
# FOR EACH VARIABLE SHARED BY THE DATAFRAMES that I can place in a word
document (which I can handle later with RTF).
#  This is how I've mapped it out in my head, but need to convert to R
language:
# StatsUTAH ---data for "zeroNO"
# sStatsUTAH ---data for "zeroYES"
# 
# Table 1: MWtotaleesDue
# colnames("zeroNO", "zeroYES")
# rownames("Mean","StdError", "Median", "StdDev", "Min", "Max", "NinetyPct",
"NinetyPctLower", "NinetyPctUpper")
# 
# Table 2: OTtotaleesDue
# same colnames & rownames as Table 1
# 
# Table 3: OTtotalBWsDue
# same colnames & rownames as Table 1
# 
# Table 4: TotalBWsFD
# same colnames & rownames as Table 1

#WHAT IS THE BEST WAY TO DO THIS IN R?
#While a loop may be more efficient, is there also a good way to create each
table separately?
#Note: my real dataframes (StatsUTAH,etc) will have a lot more variables
than what are listed in this example
#so I will probably be picking and choosing which ones I'm interested in
creating tables for.



--
View this message in context: 
http://r.789695.n4.nabble.com/combine-select-data-from-2-dataframes-sharing-same-variables-tp4671790.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge with transposed matrix.

2013-07-18 Thread arun
Hi,

m1<- matrix(NA,5,5)
m1[upper.tri(m1)]<-c(2,3,8,4,9,14,5,10,15,20)
One way would be:
m1[lower.tri(m1)]<-t(m1)[lower.tri(t(m1))]
 m1
# [,1] [,2] [,3] [,4] [,5]
#[1,]   NA    2    3    4    5
#[2,]    2   NA    8    9   10
#[3,]    3    8   NA   14   15
#[4,]    4    9   14   NA   20
#[5,]    5   10   15   20   NA
A.K. 



Hello ! 

I would like to have some simple sintax in order to fill my matrix with its 
transposed. 
That is, as an example I have a correlation matrix like this, and the 
transposed one: 

> matrix 
          [,1]            [,2]         [,3]         [,4]         [,5]         
[1,]   NA             2               3              4           5         
[2,]   NA          NA              8              9          10         
[3,]   NA          NA           NA            14         15 
[4,]   NA          NA           NA           NA         20 
[5,]   NA          NA           NA           NA         NA 

> transposed.matrix<-t(matrix) 

          [,1]            [,2]         [,3]         [,4]         [,5] 
[1,]      NA         NA          NA          NA        NA 
[2,]       2            NA         NA          NA         NA 
[3,]       3             8            NA         NA          NA 
[4,]       4             9            14          NA          NA 
[5,]       5           10           15           20           NA 


And I would like to have 

          [,1]            [,2]         [,3]         [,4]          [,5] 
[1,]      NA           2               3              4             5   
[2,]       2            NA             8              9           10     
[3,]       3             8            NA            14          15 
[4,]       4             9            14            NA           20 
[5,]       5           10           15             20           NA 


Thank you very much for your help !!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   8   9   10   >