Re: [R] Scatter plot for repeated measures
Not sure whether it is a scatterplot or just a plot with 3 lines. If it is the latter, library(reshape2) matplot(acast(my.df, TIME~ID, value.var='X'), type='l', col=1:3, ylab='X', xlab='TIME') legend('bottomright', inset=.05, legend=LETTERS[1:3], pch=1, col=1:3) A.K. On Friday, December 5, 2014 5:45 PM, farnoosh sheikhi wrote: Hi Arun, I hope you are doing well. I have a data set as follow: my.df <- data.frame(ID=rep(c("A","B","C"), 5), TIME=rep(1:5, each=3), X=1:5) I would like to get a scatterplot where x axis is Time (1,2,3,4,5) and y axis is X, but I want to have three lines separately for each ID. I basically want to tack each ID over time. Is this possible? Thanks a lot and Happy Holidays to you! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mean calculation
Hi Juvin, The error "dim(X) must have a positive length" usually shows when you are passing a vector to "apply", ie. apply(1:5,2,mean) #Error in apply(1:5, 2, mean) : dim(X) must have a positive length Also, if your dataset originally has "1206" columns, it is not clear why you needed the below code. ("rainfall" is already a "data.frame") precip=data.frame(rainfall[1:1206]) Based on the data provided, rainfall <- read.table(text="123456789 1011 NA00001200000 NA0000000000 NA00001400005 NA0000000000 NA00270000200165 NA0883800000026 NA121200000002 NA2000000000 NA2000000000 NA024100003062 NA260000000033",sep="", header=TRUE, check.names=FALSE) apply(rainfall, 2, function(x) c(mean=mean(x, na.rm=TRUE), median=median(x, na.rm=TRUE), max=max(x, na.rm=TRUE))) #1 23 4 5 6 7 8 9 1011 #meanNaN 3.818182 11.27273 6 0 2.363636 0 0 2.090909 0 26.63636 #median NA 0.00 0.0 0 0 0.00 0 0 0.00 0 2.0 #max-Inf 26.00 88.0 38 0 14.00 0 0 20.00 0 165.0 Or using `colMaxs`, `colMedians` from `matrixStats` library(matrixStats) rbind(mean=colMeans(rainfall, na.rm=TRUE), median= colMedians(as.matrix(rainfall), na.rm=TRUE), max=colMaxs(rainfall, na.rm=TRUE)) Another option would be to use `summarise_each` from `dplyr` library(dplyr) rainfall %>% summarise_each(funs(mean=mean(., na.rm=TRUE), median=median(., na.rm=TRUE), max=max(., na.rm=TRUE))) A.K. I tried to calculate a mean from a csv table by forming a data frame, but it says dim(x)must have a positive length. The table has 1206 column and 31 rows. I want to calculate mean, median, and maximum from the the table. The table has some NA values which i dont want to include. The table looks as follows: 1234567891011 NA00001200000 NA0000000000 NA00001400005 NA0000000000 NA00270000200165 NA0883800000026 NA121200000002 NA2000000000 NA2000000000 NA024100003062 NA260000000033 I used following code to calculate mean: Any help would be appreciated. rainfall=read.table('bmark.csv',header=T,sep=',') precip=data.frame(rainfall[1:1206]) monthlyMean=apply(precip, MARGIN=2,FUN=mean,na.rm=TRUE) Juvin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a way to map data from Binary format to Numerical numbers?
Try indx <- which(!!mat, arr.ind=TRUE) v1 <-unname(sapply(split(indx[,2], indx[,1]),toString)) cat(paste(v1, collapse="\n"), sep="\n") 1, 2, 3, 6, 7, 8, 9 1, 2, 3, 6, 8, 9 1, 3, 4, 6, 7, 8, 9 1, 8 1, 3, 6, 7, 8, 9 1, 3, 4, 6, 8, 9 1, 3, 5, 9 A.K. Hi, Is there a way to map data from Binary format to Numerical numbers? example: I have text files, where each record consists of several items (9 items) 1, means item appear 0, means item absent 1,1,1,0,0,1,1,1,1 1,1,1,0,0,1,0,1,1 1,0,1,1,0,1,1,1,1 1,0,0,0,0,0,0,1,0 1,0,1,0,0,1,1,1,1 1,0,1,1,0,1,0,1,1 1,0,1,0,1,0,0,0,1 I want transform my data to numerical numbers in ascending order, such that when items is absent, i didn't print it, but keep increase the counter. for example, the above binary format will be: , 1,2,3,6,7,8,9 1,2,3,6,8,9 1,3,4,6,7,8,9 1,8, 1,3,6,7,8,9 1,3,4,5,7,8 1,3,5,9 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difference in dates for unique ID
HI Farnoosh, Not sure I understand the expected output. The difference between the first 2 days is "136 days" May be this helps library(data.table) dcast.data.table(setDT(df)[, list(Visit=.N, Diff= as.numeric(abs(diff(as.Date(Date, format='%d-%b-%y') , by = ID], ID+Visit~ Diff, value.var='Diff', length) ID Visit 136 255 857 1: 1 2 1 0 0 2: 2 3 0 1 1 On Wednesday, February 11, 2015 5:47 PM, farnoosh sheikhi wrote: Hi Arun, I have a data set that look s like below. I wanted to get a difference in dates for each unique ID and record it as a new X and have binary input for each one. ID Date 106-Sep-13 120-Jan-14 206-Mar-12 225-Jun-11 229-Oct-13 For example for the first two date for ID=1 ( 20-Jan-14 - 06-Sep-13 ~ 121) and I want the data to be like follow: ID Visit 121 1 21 2 3 0 I really appreciate if you can help me with this. I know I need to write some kind of loop, but I don't know how to think of the logic behind it. Thanks a lot. Farnoosh __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 1st el of a list of vectors
Or rapply(l,function(x) x[1]) #[1] 1 3 7 set.seed(42) l1 <- replicate(1e6, list(sample(1:5,sample(8),replace=T))) system.time(r1 <- sapply(l1, `[`, 1)) # user system elapsed # 1.324 0.000 1.326 system.time(r2 <- rapply(l1, function(x) x[1])) # user system elapsed # 0.736 0.004 0.741 identical(r1,r2) #[1] TRUE system.time({ eltlens <- elementLengths(l1) r3 <- unlist(l1, use.names=FALSE)[cumsum(eltlens) - eltlens + 1L] }) # user system elapsed # 0.153 0.000 0.154 A.K. On Tuesday, July 22, 2014 12:11 AM, Richard M. Heiberger wrote: l = list(c(1,2), c(3,5,6), c(7)) sapply(l, `[`, 1) On Mon, Jul 21, 2014 at 3:55 PM, carol white wrote: > Hi, > If we have a list of vectors of different lengths, how is it possible to > retrieve the first element of the vectors of the list? > > > l = list(c(1,2), c(3,5,6), c(7)) > > 1,3,7 should be retrieved > > Thanks > > Carol > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] filter one entry, in dependence of date
Hi, If `dat` is the dataset: dat[!(dat$ID==2 & as.numeric(gsub("-.*","",dat$Month))<5),] ID Month Value 1 1 03-2014 1 2 1 04-2014 10 3 1 05-2014 50 6 2 05-2014 4 7 2 06-2014 2 A.K. hello together, i have a short question, maybe you can help me. I have a data.frame like this one ID Month Value 1 1 03-2014 1 2 1 04-2014 10 3 1 05-2014 50 4 2 03-2014 8 5 2 04-2014 7 6 2 05-2014 4 7 2 06-2014 2 I now want to create another data.frame without the lines from ID==2 which are earlier than 05-2014 The solution look like this one: ID Month Value 1 1 03-2014 1 2 1 04-2014 10 3 1 05-2014 50 4 2 05-2014 4 5 2 06-2014 2 maybe you can help me. Best regards. Mat __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] corresponding replicated el of one matrix in another matrix or vector
Try: rbind(v2,unname(setNames(v1[,1],v1[,2])[v2])) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] v2 "a" "a" "a" "a" "a" "c" "c" "c" "c" "c" "c" "c" "c" "c" "1" "1" "1" "1" "1" "3" "3" "3" "3" "3" "3" "3" "3" "3" [,15] [,16] [,17] [,18] v2 "c" "b" "b" "b" "3" "2" "2" "2" A.K. Hi, I have a matrix of unique elements (strings) like v1 and a vector which contains replicated values of the 2nd column of the first matrix. v1 = cbind(c("1","2","3"),c("a","b","c")) v2 = c(rep("a",5), rep("c",10), rep("b",3)) How can I add a column to v2 that contains the values of the first column of the first matrix v1 where the 2nd column of v1 matches the values of v2? Do I need to grep by looping over the nrow of v1 which is very time consuming or is there a better solution? the results should be the same as v3=rbind( c(rep("a",5), rep("c",10), rep("b",3)), c(rep("1",5), rep("3",10), rep("2",3))) --- v1 [,1] [,2] [,3] [1,] "1" "2" "3" [2,] "a" "b" "c" > v2 [1] "a" "a" "a" "a" "a" "c" "c" "c" "c" "c" "c" "c" "c" "c" "c" "b" "b" "b" > v3 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] "a" "a" "a" "a" "a" "c" "c" "c" "c" "c" "c" "c" "c" "c" [2,] "1" "1" "1" "1" "1" "3" "3" "3" "3" "3" "3" "3" "3" "3" [,15] [,16] [,17] [,18] [1,] "c" "b" "b" "b" [2,] "3" "2" "2" "2" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is.na() == TRUE for POSIXlt time / date of "2014-03-09 02:00:00"
Not able to reproduce the problem. str(q) # POSIXlt[1:1], format: "2014-03-09 02:00:00" is.na(q) #[1] FALSE sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit) A.K. On Wednesday, July 30, 2014 1:10 PM, John McKown wrote: "I'm so confused!" Why does is.na() report TRUE for a POSIXlt date & time of 2014-03-09 02:00:00 ? > q [1] "2014-03-09 02:00:00" > is.na(q) [1] TRUE > as.POSIXct(q) [1] NA > dput(q) structure(list(sec = 0, min = 0L, hour = 2, mday = 9L, mon = 2L, year = 114L, wday = 0L, yday = 67L, isdst = 0L, zone = "", gmtoff = NA_integer_), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", "POSIXt")) > str(q) POSIXlt[1:1], format: "2014-03-09 02:00:00" > -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DATA SUMMARIZING and REPORTING
For the example, you gave: x ##dataset indx <- t(sapply(min(x$MTH_SUPPORT):(max(x$MTH_SUPPORT) - 2), function(x) c(x, x + 2))) res <- do.call(rbind, apply(indx, 1, function(.indx) { x1 <- x[x$MTH_SUPPORT >= .indx[1] & x$MTH_SUPPORT <= .indx[2], ] Period <- paste(.indx[1], .indx[2], sep = "-") No.ofChange <- sum(x1$ATT_1[-1] != x1$ATT_1[-length(x1$ATT_1)]) Paid = with(x1, sum(A3)/(sum(A1) + sum(A2))) data.frame(ID_CASE = x$ID_CASE[1L], Period, No.ofChange, Paid, stringsAsFactors = F) })) res ID_CASE Period No.ofChange Paid 1 CB26A 201302-201304 2 0.4143646 2 CB26A 201303-201305 2 0.4452450 3 CB26A 201304-201306 1 0.444 4 CB26A 201305-201307 2 0.4607407 5 CB26A 201306-201308 1 0.4617737 6 CB26A 201307-201309 1 0.4513274 7 CB26A 201308-201310 1 0.4613779 With multiple ID_CASE, either split the dataset by ID_CASE or on the grouping functions before applying this. A.K. On Wednesday, July 30, 2014 8:48 AM, Abhinaba Roy wrote: Hi R-helpers, I have dataframe like ID_CASE YEAR_MTH ATT_1 A1 A2 A3 CB26A 201302 1 146 42 74 CB26A 201302 0 140 50 77 CB26A 201303 0 128 36 77 CB26A 201304 1 146 36 72 CB26A 201305 1 134 36 80 CB26A 201305 0 148 30 80 CB26A 201306 0 134 20 72 CB26A 201307 1 125 48 79 CB26A 201309 0 122 44 74 CB26A 201310 1 126 37 72 CB26A 201310 1 107 43 75 I want a final dataframe which will look like ID_CASE Period No.ofChange %Paid CB26A 201302-2013042 0.414365 CB26A 201303-201305 2 0.445245 CB26A 201304-201306 1 0.44 CB26A 201305-201307 2 0.460741 CB26A 201306-201308 1 0.461774 CB26A 201307-201309 1 0.451327 CB26A 201308-201310 1 0.461378 where, Period = a time period of 3 months which is shifted by 1 month subsequently No.ofChange = number of time ATT_1 has changed values in this period %Paid = sum(A3)/(sum(A1)+sum(A2)) for this period E.g. for Period=201302-201304, %Paid = (74+77+77+72)/((146+140+128+146)+(42+50+36+36)) Period calculation should start from the first YEAR_MTH for the ID_CASE, i.e., if for a ID_CASE first YEAR_MTH is 201301 or 201304 then the period should be defined accordingly. I have a dataframe with 400 unique ID_CASE, I need to do it for all ID_CASE. How can I do it in R? Regards, Abhinaba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DATA SUMMARIZING and REPORTING
With >1 ID_CASE, you may try: xN <- x xN$ID_CASE <- "CB27A" #creating another ID_CASE, other data same x <- rbind(x, xN) res1 <- do.call(rbind, lapply(split(x, x$ID_CASE), function(.x) { indx <- with(.x, t(sapply(min(MTH_SUPPORT):(max(MTH_SUPPORT) - 2), function(y) c(y, y + 2 do.call(rbind, apply(indx, 1, function(.indx) { x1 <- .x[with(.x, MTH_SUPPORT >= .indx[1] & MTH_SUPPORT <= .indx[2]), ] Period <- paste(.indx[1], .indx[2], sep = "-") x2 <- within(x1, { Paid <- sum(A3)/(sum(A1) + sum(A2)) No.ofChange <- sum(ATT_1[-1] != ATT_1[-length(ATT_1)]) }) data.frame(ID_CASE = .x$ID_CASE[1L], Period, No.ofChange = x2$No.ofChange[1L], Paid = x2$Paid[1L], stringsAsFactors = F) })) })) row.names(res1) <- 1:nrow(res1) > res1 ID_CASE Period No.ofChange Paid 1 CB26A 201302-201304 2 0.4143646 2 CB26A 201303-201305 2 0.4452450 3 CB26A 201304-201306 1 0.444 4 CB26A 201305-201307 2 0.4607407 5 CB26A 201306-201308 1 0.4617737 6 CB26A 201307-201309 1 0.4513274 7 CB26A 201308-201310 1 0.4613779 8 CB27A 201302-201304 2 0.4143646 9 CB27A 201303-201305 2 0.4452450 10 CB27A 201304-201306 1 0.444 11 CB27A 201305-201307 2 0.4607407 12 CB27A 201306-201308 1 0.4617737 13 CB27A 201307-201309 1 0.4513274 14 CB27A 201308-201310 1 0.4613779 A.K. On Thursday, July 31, 2014 12:34 AM, arun wrote: For the example, you gave: x ##dataset indx <- t(sapply(min(x$MTH_SUPPORT):(max(x$MTH_SUPPORT) - 2), function(x) c(x, x + 2))) res <- do.call(rbind, apply(indx, 1, function(.indx) { x1 <- x[x$MTH_SUPPORT >= .indx[1] & x$MTH_SUPPORT <= .indx[2], ] Period <- paste(.indx[1], .indx[2], sep = "-") No.ofChange <- sum(x1$ATT_1[-1] != x1$ATT_1[-length(x1$ATT_1)]) Paid = with(x1, sum(A3)/(sum(A1) + sum(A2))) data.frame(ID_CASE = x$ID_CASE[1L], Period, No.ofChange, Paid, stringsAsFactors = F) })) res ID_CASE Period No.ofChange Paid 1 CB26A 201302-201304 2 0.4143646 2 CB26A 201303-201305 2 0.4452450 3 CB26A 201304-201306 1 0.444 4 CB26A 201305-201307 2 0.4607407 5 CB26A 201306-201308 1 0.4617737 6 CB26A 201307-201309 1 0.4513274 7 CB26A 201308-201310 1 0.4613779 With multiple ID_CASE, either split the dataset by ID_CASE or on the grouping functions before applying this. A.K. On Wednesday, July 30, 2014 8:48 AM, Abhinaba Roy wrote: Hi R-helpers, I have dataframe like ID_CASE YEAR_MTH ATT_1 A1 A2 A3 CB26A 201302 1 146 42 74 CB26A 201302 0 140 50 77 CB26A 201303 0 128 36 77 CB26A 201304 1 146 36 72 CB26A 201305 1 134 36 80 CB26A 201305 0 148 30 80 CB26A 201306 0 134 20 72 CB26A 201307 1 125 48 79 CB26A 201309 0 122 44 74 CB26A 201310 1 126 37 72 CB26A 201310 1 107 43 75 I want a final dataframe which will look like ID_CASE Period No.ofChange %Paid CB26A 201302-2013042 0.414365 CB26A 201303-201305 2 0.445245 CB26A 201304-201306 1 0.44 CB26A 201305-201307 2 0.460741 CB26A 201306-201308 1 0.461774 CB26A 201307-201309 1 0.451327 CB26A 201308-201310 1 0.461378 where, Period = a time period of 3 months which is shifted by 1 month subsequently No.ofChange = number of time ATT_1 has changed values in this period %Paid = sum(A3)/(sum(A1)+sum(A2)) for this period E.g. for Period=201302-201304, %Paid = (74+77+77+72)/((146+140+128+146)+(42+50+36+36)) Period calculation should start from the first YEAR_MTH for the ID_CASE, i.e., if for a ID_CASE first YEAR_MTH is 201301 or 201304 then the period should be defined accordingly. I have a dataframe with 400 unique ID_CASE, I need to do it for all ID_CASE. How can I do it in R? Regards, Abhinaba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question
Hi Farnoosh, Regarding the first question: dat2 <- dat1 dat1$Mean <- setNames(unsplit(sapply(split(dat1[,-1], dat1[,1]),rowMeans, na.rm=T),dat1[,1]),NULL) dat1 Unit q1 q2 q3 Mean 1 A 3 1 2 2.00 2 A 2 NA 1 1.50 3 B 2 2 4 2.67 4 B NA 2 5 3.50 5 C 3 2 NA 2.50 6 C 4 1 4 3.00 7 A 3 2 NA 2.50 second question, is not clear. Assuming that you want something like this: dat2[,-1] <- (!is.na(dat2[,-1]))+0 dat2$indx <- with(dat2, ave(rep(1, nrow(dat2)), Unit, FUN=cumsum)) library(reshape2) dcast(melt(dat2, id.var=c("indx","Unit")), variable+indx~Unit, value.var="value", fill=0)[,-2] variable A B C 1 q1 1 1 1 2 q1 1 0 1 3 q1 1 0 0 4 q2 1 1 1 5 q2 0 1 1 6 q2 1 0 0 7 q3 1 1 0 8 q3 1 1 1 9 q3 0 0 0 A.K. On Wednesday, July 30, 2014 1:42 PM, farnoosh sheikhi wrote: Hi Arun, I have two questions, I have a data like below: dat1<-read.table(text=" Unit q1q2q3 A312 A2NA1 B224 BNA25 C32NA C414 A32NA ",sep="",header=T,stringsAsFactors=F) I want to get the average of each row by the number of answered questions. For example second row would be (2+1)/2 since there is a NA. Secondly, I want to pivot the units like: UnitA, UnitB, Unit C as columns and have 1 and zero as values. Thanks a lot for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] separate numbers from chars in a string
If you have some variations of the order of numbers followed by chars, library(stringr) v1 <- c("absdfds0213451ab", "123abcs4145") pattern=c("[A-Za-z]+", "\\d+") do.call(`Map`,c(c,lapply(pattern, function(.pat) str_extract_all(v1, .pat #[[1]] #[1] "absdfds" "ab" "0213451" #[[2]] #[1] "abcs" "123" "4145" A.K. Hi, If I have a string of consecutive chars followed by consecutive numbers and then chars, like "absdfds0213451ab", how to separate the consecutive chars from consecutive numbers? grep doesn't seem to be helpful grep("[a-z]","absdfds0213451ab", ignore.case=T) [1] 1 grep("[0-9]","absdfds0213451ab", ignore.case=T) [1] 1 Thanks Carol __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex - subsetting parts of a file name.
Try: gsub(".*\\.(.*)\\..*","\\1", my.cache.list) [1] "subject_test" "subject_train" "y_test" "y_train" #or library(stringr) str_extract(my.cache.list, perl('(?<=\\.).*(?=\\.)')) [1] "subject_test" "subject_train" "y_test" "y_train" A.K. On Thursday, July 31, 2014 11:05 AM, arnaud gaboury wrote: A directory is full of data.frames cache files. All these files have the same pattern: df.some_name.RData my.cache.list <- c("df.subject_test.RData", "df.subject_train.RData", "df.y_test.RData", "df.y_train.RData") I want to keep only the part inside the two points. After lots of headache using grep() when trying something like this: grep('.(.*?).','df.subject_test.RData',value=T) I couldn't find a clean one liner and found this workaround: my.cache.list <- gsub('df.','',my.cache.list) my.cache.list <- gsub('.RData','',my.cache.list) The two above commands do the trick, but a clean one line with some regex expression would be a more "elegant" way. Does anyone have any suggestion ? TY for help __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract word before /// in a data frame contain many thousands rows.
Try: If dat is the dataset. library(stringr) res <- str_extract(dat$Gene.Symbol, perl('[[:alnum:]]+(?= \\/)')) res[!is.na(res)] #[1] "CDH23" A.K. On Thursday, July 31, 2014 9:54 PM, Stephen HK Wong wrote: Dear All, I appreciate if you can help me out this. I have a data frame contains many thousand of rows, with some rows that has /// symbol, as shown in in row 2, I want to extract word before ///, such as in this case, CDH23. Many thanks. Probe.Set.ID Gene.Symbol 1 1552301_a_at CORO6 2 1552436_a_at CDH23 /// LOC100653137 3 1552477_a_at IRF6 4 1552685_a_at GRHL1 5 1552742_at KCNH8 6 1552752_a_at CADM2 7 1552799_at TSNARE1 8 1552897_a_at KCNG3 9 1552902_a_at FOXP2 10 1552903_at B4GALNT2 structure(list(Probe.Set.ID = c("1552301_a_at", "1552436_a_at", "1552477_a_at", "1552685_a_at", "1552742_at", "1552752_a_at", "1552799_at", "1552897_a_at", "1552902_a_at", "1552903_at"), Gene.Symbol = c("CORO6", "CDH23 /// LOC100653137", "IRF6", "GRHL1", "KCNH8", "CADM2", "TSNARE1", "KCNG3", "FOXP2", "B4GALNT2" )), .Names = c("Probe.Set.ID", "Gene.Symbol"), row.names = c(NA, 10L), class = "data.frame") Stephen HK Wong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to transform the data frame into the list?
Use ?split() split(dat[,-4], dat$Year_Month) #dat is the dataset. A.K. Country Product Price Year_Month AE 1 20 201204 DE 1 20 201204 CN 1 28 201204 AE 2 28 201204 DE 2 28 201204 CN 2 22 201204 AE 3 28 201204 CN 3 28 201204 AE 1 20 201205 DE 1 20 201205 CN 1 28 201205 AE 2 28 201205 DE 2 28 201205 How to create the list? which has: [[201204]] Country Product Price AE 1 20 DE 1 20 CN 1 28 AE 2 28 DE 2 28 CN 2 22 AE 3 28 CN 3 28 [[201205]] Country Product Price AE 1 20 DE 1 20 CN 1 28 AE 2 28 DE 2 28 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Better use with gsub
You could try: library(stringr) simplify2array(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+'))) [,1] [,2] [,3] [,4] [,5] [,6] [1,] "24" "24" "24" "24" "24" "24" [2,] "57" "86" "119" "129" "138" "163" A.K. On Friday, August 1, 2014 10:49 AM, "Doran, Harold" wrote: I have done an embarrassingly bad job using a mixture of gsub and strsplit to solve a problem. Below is sample code showing what I have to start with (the vector xx) and I want to end up with two vectors x and y that contain only the digits found in xx. Any regex users with advice most welcome Harold xx <- c("S24:57", "S24:86", "S24:119", "S24:129", "S24:138", "S24:163") yy <- gsub("S","\\1", xx) a1 <- gsub(":"," ", yy) a2 <- sapply(a1, function(x) strsplit(x, ' ')) x <- as.numeric(sapply(a2, function(x) x[1])) y <- as.numeric(sapply(a2, function(x) x[2])) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Better use with gsub
Forgot about as.numeric. sapply(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+')),as.numeric) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 24 24 24 24 24 24 [2,] 57 86 119 129 138 163 On Friday, August 1, 2014 10:59 AM, arun wrote: You could try: library(stringr) simplify2array(str_extract_all(xx, perl('(?<=[A-Z]|\\:)\\d+'))) [,1] [,2] [,3] [,4] [,5] [,6] [1,] "24" "24" "24" "24" "24" "24" [2,] "57" "86" "119" "129" "138" "163" A.K. On Friday, August 1, 2014 10:49 AM, "Doran, Harold" wrote: I have done an embarrassingly bad job using a mixture of gsub and strsplit to solve a problem. Below is sample code showing what I have to start with (the vector xx) and I want to end up with two vectors x and y that contain only the digits found in xx. Any regex users with advice most welcome Harold xx <- c("S24:57", "S24:86", "S24:119", "S24:129", "S24:138", "S24:163") yy <- gsub("S","\\1", xx) a1 <- gsub(":"," ", yy) a2 <- sapply(a1, function(x) strsplit(x, ' ')) x <- as.numeric(sapply(a2, function(x) x[1])) y <- as.numeric(sapply(a2, function(x) x[2])) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining Rows from One Data Frame, Outputting into Another
You could use: library(dplyr) library(tidyr) x.df %>% group_by(Year, Group, Eye_Color) %>% summarize(n=n()) %>% spread(Eye_Color,n, fill=0) Source: local data frame [6 x 5] Year Group blue brown green 1 2000 1 2 1 0 2 2000 2 0 0 2 3 2001 1 1 0 0 4 2001 2 1 1 0 5 2001 3 1 0 0 6 2002 1 1 0 0 Or library(reshape2) dcast(x.df, Year+Group~Eye_Color, value.var="Eye_Color") A.K. On Friday, August 1, 2014 7:06 PM, Kathy Haapala wrote: If I have a dataframe x.df as follows: > x.df <- data.frame(Year = c(2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2002), Group = c(1, 1, 1, 2, 2, 1, 2, 2, 3, 1), Eye_Color = c("blue", "blue", "brown", "green", "green", "blue", "brown", "blue", "blue", "blue")) > x.df Year Group Eye_Color 1 2000 1 blue 2 2000 1 blue 3 2000 1 brown 4 2000 2 green 5 2000 2 green 6 2001 1 blue 7 2001 2 brown 8 2001 2 blue 9 2001 3 blue 10 2002 1 blue how can I turn it into a new dataframe that would take the data from multiple rows of Year/Group combinations and output the data into one row for each combination, like this: > x_new.df Year Group No_blue No_brown No_green 1 2000 1 2 1 0 2 2000 2 0 0 2 3 2001 1 1 0 0 4 2001 2 1 1 0 5 2001 3 1 0 0 6 2002 1 1 0 0 I've been trying to use for loops, but I'm wondering if anyone has a better or more simple suggestion. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare data in two rows and replace objects in data frame
You could try data.table #dat is the dataset library(data.table) v1 <- setNames(c("HT", "A", "B", "Aht", "Bht"), c("11", "10", "01", "1-", "-1")) dat2 <- setDT(dat1)[, lapply(.SD, function(x) v1[paste(x, collapse="")]), by=CloneID] A.K. On Monday, August 4, 2014 5:55 AM, raz wrote: Dear all, I have a data frame 144 x 2 values. I need to take every value in the first row and compare to the second row, and the same for rows 3-4 and 5-6 and so on. the output should be one line for each of the two row comparison. the comparison is: if row1==1 and row2==1 <-'HT' if row1==1 and row2==0 <-'A' if row1==0 and row2==1 <-'B' if row1==1 and row2=='-' <-'Aht' if row1=='-' and row2==1 <-'Bht' for example: if the data is: CloneID genotype 2001 genotype 2002 genotype 2003 2471250 1 1 1 2471250 0 0 0 2433062 0 0 0 2433062 1 1 1 100021605 1 1 0 100021605 1 0 1 15599 1 1 0 15599 1 1 1 12798 1 1 0 12798 1 1 1 then the output should be: CloneID genotype 2001 genotype 2002 genotype 2003 2471250 A A A 2433062 B B B 100021605 HT A B 15599 HT HT B 12798 HT HT B I tried this for the whole data, but its so slow: AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE) for (i in seq(1,nrow(AX),by=2)){ for (j in 6:144){ if (AX[i,j]==1 & AX[i+1,j]==0){ AX[i,j]<-'A' } if (AX[i,j]==0 & AX[i+1,j]==1){ AX[i,j]<-'B' } if (AX[i,j]==1 & AX[i+1,j]==1){ AX[i,j]<-'HT' } if (AX[i,j]==1 & AX[i+1,j]=="-"){ AX[i,j]<-'Aht' } if (AX[i,j]=="-" & AX[i+1,j]==1){ AX[i,j]<-'Bht' } } } AX1<-AX[!duplicated(AX[,3]),] AX2<-AX[duplicated(AX[,3]),] Thanks for any help, Raz -- \m/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract descriptive stats for categorial data from dataframe
You could try: lv <- levels(unique(unlist(df))) as.data.frame(t(apply(df, 2, function(x) table(factor(x, levels=lv) + - 0 i1 10 0 0 i2 10 0 0 i3 0 10 0 i4 0 9 1 i5 10 0 0 i6 1 9 0 i7 9 0 1 i8 4 2 4 i9 7 1 2 A.K. On Tuesday, August 5, 2014 5:36 AM, Alain D. wrote: Dear R-List, I want to have descriptive stats in a special form and cannot figure out a nice solution. df<-as.data.frame(cbind(i1=rep("+"),i2=rep("+",10),i3=rep("-",10),i4=c(rep("-",2),"0",rep("-",7)),i5=rep("+",10),i6=c(rep("-",9),"+"),i7=c(rep("+",4),"0",rep("+",5)),i8=c(rep(0,4),rep("+",3),"-","+","-"),i9=c(rep("+",5),"-",rep("+",2),rep(0,2 now I want the categories as var labels arranged in cols with IDs as first col and then frequencies for each category. Something like this: var + - 0 i1 10 0 0 i2 10 0 0 i3 0 10 0 i4 0 9 1 i5 10 0 0 i6 1 9 0 i7 9 0 1 i8 4 2 4 i9 7 1 2 I tried different combinations of freq<-as.data.frame(df<-lapply(df,table)) but was not very successful. I would be very thankful for an easy solution which is probably to obvious for me to spot. Thank you very much. Best wishes Alain [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] populating matrix with binary variable after matching data from data frame
You could try: x1$V2[1] <- "TCLA1" x[outer(rownames(x), colnames(x), FUN=paste) %in% as.character(interaction(x1, sep=" "))] <- 1 x TCLA1 VPS41 ABCA13 ABCA4 AKT3 1 0 0 0 AKTIP 0 1 0 0 ABCA13 0 0 0 0 ABCA4 0 0 0 0 A.K. On Tuesday, August 12, 2014 8:16 PM, Adrian Johnson wrote: Hi: sorry I have a basic question. I have a data frame with two columns: > x1 V1 V2 1 AKT3 TCL1A 2 AKTIP VPS41 3 AKTIP PDPK1 4 AKTIP GTF3C1 5 AKTIP HOOK2 6 AKTIP POLA2 7 AKTIP KIAA1377 8 AKTIP FAM160A2 9 AKTIP VPS16 10 AKTIP VPS18 I have a matrix 1211x1211 (using some elements in x1$V1 and some from x1$V2). I want to populate for every match for example AKT3 = TCL1A = 1 whereas AKT3 - VPS41 gets 0) How can i map this binary relations in x. >x TCLA1 VPS41 ABCA13 ABCA4 AKT3 0 0 0 0 AKTIP 0 0 0 0 ABCA13 0 0 0 0 ABCA4 0 0 0 0 dput - x = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L, 4L), .Dimnames = list(c("AKT3", "AKTIP", "ABCA13", "ABCA4" ), c("TCLA1", "VPS41", "ABCA13", "ABCA4"))) x1 = structure(list(V1 = c("AKT3", "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP", "AKTIP"), V2 = c("TCL1A", "VPS41", "PDPK1", "GTF3C1", "HOOK2", "POLA2", "KIAA1377", "FAM160A2", "VPS16", "VPS18")), .Names = c("V1", "V2"), row.names = c(NA, 10L), class = "data.frame") Thanks Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to avoid change string to number automaticlly in r
A similar post was found in stackoverflow (http://stackoverflow.com/questions/25328311/how-to-avoid-change-string-to-number-automaticlly-in-r), which already got an accepted reply. A.K. On Friday, August 15, 2014 2:18 PM, Wenlan Tian wrote: I was trying to save some string into a matrix, but it automatically changed to numbers (levels). How can i avoid it?? Here is the original table: trt means M1 0 12.16673 a2 111 11.86369 ab3 125 11.74433 ab4 14 11.54073 b I wanna to save to a matrix like: J0001 a ab ab b But, what i get is: J0001 1 2 2 3 How can i avoid this? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex pattern assistance
Hi Tom, You could try: library(stringr) str_extract(x, perl("(?<=[A-Za-z]{4}/).*(?=/[0-9])")) #[1] "S01-012" A.K. On Friday, August 15, 2014 12:20 PM, Tom Wright wrote: Hi, Can anyone please assist. given the string > x<-"/mnt/AO/AO Data/S01-012/120824/" I would like to extract "S01-012" require(stringr) > str_match(x,"\\/mnt\\/AO\\/AO Data\\/(.+)\\/+") > str_match(x,"\\/mnt\\/AO\\/AO Data\\/(\\w+)\\/+") both nearly work. I expected I would use something like: > str_match(x,"\\/mnt\\/AO\\/AO Data\\/([\\w -]+)\\/+") but I don't seem able to get the square bracket grouping to work correctly. Can someone please show me where I am going wrong? Thanks, Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANY ONE HERE PLZ Urgent
Try: format(as.Date("05/07/2014", "%m/%d/%Y"), "%m") #[1] "05" #or strptime("05/07/2014", "%m/%d/%Y")$mon+1 #[1] 5 A.K. How to extract a Month from Date object? almost 13 peoples visited my Question with out replying in New to R , i have task yaar don't mind plz could you HELP ME How to extract a Month from Date object? as.month("05/07/2014", format = "%m") tried wityh this __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] r convert current date format from y-m-d to m/d/y
Hi, Use ?format format(d, "%m/%d/%Y") #[1] "09/01/2014" A.K. On Monday, September 1, 2014 5:26 AM, Velappan Periasamy wrote: d=Sys.Date() "2014-09-01" How to convert this "2014-09-01" to "09/01/2014" format? (ie y-m-d to m/d/y format) thanks veepsirtt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generate sequence of date based on a group ID
If the `ids` are ordered as shown in the example, perhaps you need tbl <- table(df$id) rep(seq(as.Date("2000-01-01"), length.out=length(tbl), by=1), tbl) [1] "2000-01-01" "2000-01-01" "2000-01-01" "2000-01-01" "2000-01-01" [6] "2000-01-02" "2000-01-02" "2000-01-02" "2000-01-02" "2000-01-02" [11] "2000-01-03" "2000-01-03" "2000-01-03" "2000-01-03" "2000-01-03" [16] "2000-01-04" "2000-01-04" "2000-01-04" "2000-01-04" "2000-01-05" [21] "2000-01-05" "2000-01-05" "2000-01-05" A.K. On Wednesday, October 8, 2014 3:57 AM, Kuma Raj wrote: I want to generate a sequence of date based on a group id(similar IDs should have same date). The id variable contains unequal observations and the length of the data set also varies. How could I create a sequence that starts on specific date (say January 1, 2000 onwards) and continues until the end without specifying length? Sample data follows: df<-structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), out1 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L)), .Names = c("id", "out1"), class = "data.frame", row.names = c(NA, -23L)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
You could try library(dplyr) data1 %>% rowwise() %>% mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) Source: local data frame [7 x 6] Groups: idmrjdatecocdateinhdatehaldateoldflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 3 3 2009-10-242011-10-132011-10-13 4 4 2007-10-10 2007-10-10 5 5 2006-09-01 2005-08-10 2006-09-01 6 6 2007-09-04 2011-10-05 2011-10-05 7 7 2005-10-25 2011-11-04 2011-11-04 A.K. On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" wrote: Hello, The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations. I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package. I am getting correct results (NA in the new column) if a given row has all NA's in the four columns. However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns). I would appreciate receiving your help toward resolving the issue. Please see the R console and the R script (reproducible example)below. Thanks in advance. Pradip ## from the console print (data2) idmrjdatecocdateinhdatehaldateoidflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04 2 2 3 3 2009-10-242011-10-132011-11-04 4 4 2007-10-10 2011-11-04 5 5 2006-09-01 2005-08-10 2011-11-04 6 6 2007-09-04 2011-10-05 2011-11-04 7 7 2005-10-25 2011-11-04 2011-11-04 ## Reproducible code and data # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp <- "id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04" # read the data object data1 <- read.table(textConnection(temp), colClasses=c("character", "Date", "Date", "Date", "Date"), header=TRUE, as.is=TRUE ) # create a new column data2 <- mutate(data1, oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate) & is.na(haldate), NA, max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE ) ) ) # convert to date data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01") # print records print (data2) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
Dear Pradip, >From the documentation of ?max: The minimum and maximum of a numeric empty set are ‘+Inf’ and ‘-Inf’ One of the rows in your dataset is all `NAs.` I am not sure you want to keep that row with all NAs. You could remove it and run the code or keep it and run with that warning. data1 <- data1[rowSums(is.na(data1[,-1]))!=4,] data1 %>% rowwise()%>% mutate(oldflag= as.Date(max(mrjdate, cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01') A.K. On Sunday, November 9, 2014 9:16 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" wrote: Dear Arun, Thank you so much for sending me the dplyr/mutate() solution to my code. But, I am getting the following warning message. Any suggestions on how to avoid this message? Pradip Warning message: In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) : no non-missing arguments to max; returning -Inf # data1 %>% + + rowwise() %>% + mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, + na.rm=TRUE), origin='1970-01-01')) Source: local data frame [7 x 6] Groups: idmrjdatecocdateinhdatehaldateoldflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 3 3 2009-10-242011-10-132011-10-13 4 4 2007-10-10 2007-10-10 5 5 2006-09-01 2005-08-10 2006-09-01 6 6 2007-09-04 2011-10-05 2011-10-05 7 7 2005-10-25 2011-11-04 2011-11-04 Warning message: In max(13081, NA_real_, NA_real_, 15282, na.rm = TRUE) : no non-missing arguments to max; returning -Inf Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 -Original Message- Sent: Sunday, November 09, 2014 7:00 AM To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb) You could try library(dplyr) data1 %>% rowwise() %>% mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) Source: local data frame [7 x 6] Groups: idmrjdatecocdateinhdatehaldateoldflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 3 3 2009-10-242011-10-132011-10-13 4 4 2007-10-10 2007-10-10 5 5 2006-09-01 2005-08-10 2006-09-01 6 6 2007-09-04 2011-10-05 2011-10-05 7 7 2005-10-25 2011-11-04 2011-11-04 A.K. On Saturday, November 8, 2014 11:42 PM, "Muhuri, Pradip (SAMHSA/CBHSQ)" wrote: Hello, The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations. I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package. I am getting correct results (NA in the new column) if a given row has all NA's in the four columns. However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns). I would appreciate receiving your help toward resolving the issue. Please see the R console and the R script (reproducible example)below. Thanks in advance. Pradip ## from the console print (data2) idmrjdatecocdateinhdatehaldateoidflag 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04 2 2 3 3 2009-10-242011-10-132011-11-04 4 4 2007-10-10 2011-11-04 5 5 2006-09-01 2005-08-10 2011-11-04 6 6 2007-09-04 2011-10-05 2011-11-04 7 7 2005-10-25 2011-11-04 2011-11-04 ## Reproducible code and data # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp <- "id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04" # read the data object data1 <- read.table(textConnection(temp), colClasses=c("character", "Date", "Date", "Date", "Date"),
Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)
Try range(data2$oiddate[complete.cases(data2$oiddate) & is.finite(data2$oiddate)]) #[1] "2006-09-01" "2011-11-04" If you look at the `dput` output, it is `Inf` for oiddate dput(data2$oiddate) structure(c(14078, -Inf, 15260, 13796, 13392, 15252, 15282), class = "Date") A.K. On Monday, November 10, 2014 11:15 AM, "Muhuri, Pradip (SAMHSA/CBHSQ)" wrote: Hello, The range() with complete.cases() removes NA's for the date variables that are read from a data frame. However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate(). The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated. Thanks, Pradip Muhuri # cut and pasted from the R console idmrjdatecocdateinhdatehaldateoiddate 1 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18 2 2 3 3 2009-10-242011-10-132011-10-13 4 4 2007-10-10 2007-10-10 5 5 2006-09-01 2005-08-10 2006-09-01 6 6 2007-09-04 2011-10-05 2011-10-05 7 7 2005-10-25 2011-11-04 2011-11-04 > > # range of dates > > range(data2$mrjdate[complete.cases(data2$mrjdate)]) [1] "2004-11-04" "2009-10-24" > range(data2$cocdate[complete.cases(data2$cocdate)]) [1] "2005-08-10" "2011-10-05" > range(data2$inhdate[complete.cases(data2$inhdate)]) [1] "2005-07-07" "2011-10-13" > range(data2$haldate[complete.cases(data2$haldate)]) [1] "2007-11-07" "2011-11-04" > range(data2$oiddate[complete.cases(data2$oiddate)]) [1] NA "2011-11-04" reproducible code # library(dplyr) library(lubridate) library(zoo) # data object - description of the temp <- "id mrjdate cocdate inhdate haldate 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2 NA NA NA NA 3 2009-10-24 NA 2011-10-13 NA 4 2007-10-10 NA NA NA 5 2006-09-01 2005-08-10 NA NA 6 2007-09-04 2011-10-05 NA NA 7 2005-10-25 NA NA 2011-11-04" # read the data object data1 <- read.table(textConnection(temp), colClasses=c("character", "Date", "Date", "Date", "Date"), header=TRUE, as.is=TRUE ) # create a new column data2 <- data1 %>% rowwise() %>% mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) # print records print (data2) # range of dates range(data2$mrjdate[complete.cases(data2$mrjdate)]) range(data2$cocdate[complete.cases(data2$cocdate)]) range(data2$inhdate[complete.cases(data2$inhdate)]) range(data2$haldate[complete.cases(data2$haldate)]) range(data2$oiddate[complete.cases(data2$oiddate)]) Pradip K. Muhuri, PhD SAMHSA/CBHSQ 1 Choke Cherry Road, Room 2-1071 Rockville, MD 20857 Tel: 240-276-1070 Fax: 240-276-1260 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting multiple rows of a data frame at once
Hi, Try this: set.seed(24) df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5)) #Used a shorter vector x1<- c(1.05,2.85,3.40,4.25,0.25) y1<- c(0.25,0.10,0.90,0.25,1.05) res<-do.call(rbind,lapply(seq_along(x1),function(i) subset(df,x==x1[i]&y==y1[i]))) head(res,2) # x y z #466 1.05 0.25 0.7865224 #4119 1.05 0.25 -1.5679096 tail(res,2) # x y z #98120 0.25 1.05 -2.1239596 #98178 0.25 1.05 0.3321464 A.K. Hi Everyone, First time poster so any posting rules i should know about feel free to advise... I've got a data frame of 250 000 rows in columns of x y and z. i need to extract 20-30 rows from the data frame with specific x and y values, such that i can find the z value that corresponds. There is no repeated data. (its actually 250 000 squares in a 5x5m grid) to find them individually i can use subset successfully result<-subset(df,x==1.05 & y==c0.25) gives me the row in the dataframe with that x and y value. so if i have x = 1.05 2.85 3.40 4.25 0.25 3.05 3.70 0.20 0.30 0.70 1.05 1.20 1.40 1.90 2.70 3.25 3.55 4.60 2.05 2.15 3.70 4.85 4.90 1.60 2.45 3.20 3.90 4.45 and y= 0.25 0.10 0.90 0.25 1.05 1.70 2.05 2.90 2.35 2.60 2.55 2.15 2.75 2.05 2.70 2.25 2.55 2.05 3.65 3.05 3.00 3.50 3.75 4.85 4.50 4.50 3.35 4.90 then how can i retrieve the rows for all those values at once. if i name x=xt and y=yt and then result<-subset(df,x==xt & y==yt) then i get result [1] x y Height <0 rows> (or 0-length row.names) i dont understand why zero rows are selected. obviously im applying the vectors inappropriately, but i cant seem to find anything on this method of subsetting online. Thanks for any replies! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] change cell values
Hi, set.seed(24) mat1=matrix(rnorm(12),3) set.seed(28) mat2=matrix(rnorm(12),3) indx<- mat1<1 & mat2<1 mat1[indx]<-NA mat2[indx]<-NA mat1 # [,1] [,2] [,3] [,4] #[1,] NA NA NA 0.002311942 #[2,] NA NA NA NA #[3,] NA NA NA 0.598269113 mat2 # [,1] [,2] [,3] [,4] #[1,] NA NA NA 1.841481 #[2,] NA NA NA NA #[3,] NA NA NA 1.520367 A.K. - Original Message - From: JiangZhengyu To: "r-help@r-project.org" Cc: Sent: Wednesday, July 3, 2013 5:27 PM Subject: [R] change cell values Dear R experts, I have two matrices (mat1 & mat2) with the same dimension & the cells (row and column) are corresponding to each other. I want to change cell values to NA given values of the corresponding cells in mat1 and mat2 are both <1. E.g. both mat1[2,3] and mat2[2,3] are <1, I will put mat1[2,3]=NA, and mat2[2,3]=NA; if either mat1[2,3]>=1 or mat2[2,3]>=1, I will save both cells. I tried the code, but not working. Could anyone can help fix the problem? mat1[mat1<1&mat2<1]=NA mat2[mat1<1&mat2<1]=NA > mat1=matrix(rnorm(12),3) > mat2=matrix(rnorm(12),3) > mat1 [,1] [,2] [,3] [,4] [1,] -1.3387075 -0.7142333 -0.5614211 0.1846955 [2,] -0.7936087 -0.2215797 -0.3686067 0.7328731 [3,] 0.6505082 0.1826019 1.5577883 -1.5580384 > mat2 [,1] [,2] [,3] [,4] [1,] 0.4331573 -1.8086826 -1.7688123 -1.4278934 [2,] -0.1841451 0.1738648 -1.1086942 1.3065109 [3,] -1.0827245 -0.4143808 -0.6889405 0.4046203 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modify timestemp
Hi, May be this helps: dat1# dataset dat1[,2]<-gsub("\\d+$","00",dat1[,2]) dat1 # Date Time #1 01/01/2013 00:09:00 #2 01/02/2013 00:10:00 #3 01/03/2013 00:11:00 #4 01/04/2013 00:12:00 #5 01/05/2013 00:13:00 #6 01/06/2013 00:15:00 #7 01/07/2013 00:16:00 #8 01/08/2013 00:17:00 #9 01/09/2013 00:18:00 #10 01/10/2013 00:19:00 A.K. Hey All, I want to standardize my timestamp which is formatted as hh:mm:ss My data looks like this: Date Time 01/01/2013 00:09:01 01/02/2013 00:10:14 01/03/2013 00:11:27 01/04/2013 00:12:40 01/05/2013 00:13:53 01/06/2013 00:15:06 01/07/2013 00:16:19 01/08/2013 00:17:32 01/09/2013 00:18:45 01/10/2013 00:19:58 Dataset <- structure(list(Date = c("01/01/2013", "01/02/2013", "01/03/2013", "01/04/2013", "01/05/2013", "01/06/2013", "01/07/2013", "01/08/2013", "01/09/2013", "01/10/2013"), Time = c("00:09:01", "00:10:14", "00:11:27", "00:12:40", "00:13:53", "00:15:06", "00:16:19", "00:17:32", "00:18:45", "00:19:58")), .Names = c("Date", "Time"), class = "data.frame", row.names = c(NA, -10L)) I would like to change all the records in "Time" column uniformed as hh:mm:00, then the output would be this: Date Time 01/01/2013 00:09:00 01/02/2013 00:10:00 01/03/2013 00:11:00 01/04/2013 00:12:00 01/05/2013 00:13:00 01/06/2013 00:15:00 01/07/2013 00:16:00 01/08/2013 00:17:00 01/09/2013 00:18:00 01/10/2013 00:19:00 Thanks for your help! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting multiple rows of a data frame at once
Hi, Possibly, FAQ 7.31 Using the same example: set.seed(24) df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5)) dfOld<- df df[,1:2]<- lapply(df[,1:2],function(x) sprintf("%.2f",x)) x1<- c(1.05,2.85,3.40,4.25,0.25) y1<- c(0.25,0.10,0.90,0.25,1.05) x1New<-sprintf("%.2f",x1) y1New<- sprintf("%.2f",y1) res1<-do.call(rbind,lapply(seq_along(x1New),function(i) subset(df,x==x1New[i]&y==y1New[i]))) res<-do.call(rbind,lapply(seq_along(x1),function(i) subset(dfOld,x==x1[i]&y==y1[i]))) dim(res1) #[1] 318 3 dim(res) #[1] 250 3 res1[,1:2]<- lapply(res1[,1:2],as.numeric) str(res1) #'data.frame': 318 obs. of 3 variables: # $ x: num 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 ... # $ y: num 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ... # $ z: num 0.787 -1.568 -1.626 -0.221 -0.7 ... A.K. nevermind error on my behalf got it going. I have another issue, it leaves some values out. ive seperately searched the df and theyre definitely in there... so it there some sort of exclusion rule? there are about 8 of the 28 missing... the first row missing is 3.05,1.70 . i looked up the documentation for subset but i cant see why it would skip ones... thanks - Original Message - From: arun To: R help Cc: Sent: Wednesday, July 3, 2013 7:37 AM Subject: Re: Subsetting multiple rows of a data frame at once Hi, Try this: set.seed(24) df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y= sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5)) #Used a shorter vector x1<- c(1.05,2.85,3.40,4.25,0.25) y1<- c(0.25,0.10,0.90,0.25,1.05) res<-do.call(rbind,lapply(seq_along(x1),function(i) subset(df,x==x1[i]&y==y1[i]))) head(res,2) # x y z #466 1.05 0.25 0.7865224 #4119 1.05 0.25 -1.5679096 tail(res,2) # x y z #98120 0.25 1.05 -2.1239596 #98178 0.25 1.05 0.3321464 A.K. Hi Everyone, First time poster so any posting rules i should know about feel free to advise... I've got a data frame of 250 000 rows in columns of x y and z. i need to extract 20-30 rows from the data frame with specific x and y values, such that i can find the z value that corresponds. There is no repeated data. (its actually 250 000 squares in a 5x5m grid) to find them individually i can use subset successfully result<-subset(df,x==1.05 & y==c0.25) gives me the row in the dataframe with that x and y value. so if i have x = 1.05 2.85 3.40 4.25 0.25 3.05 3.70 0.20 0.30 0.70 1.05 1.20 1.40 1.90 2.70 3.25 3.55 4.60 2.05 2.15 3.70 4.85 4.90 1.60 2.45 3.20 3.90 4.45 and y= 0.25 0.10 0.90 0.25 1.05 1.70 2.05 2.90 2.35 2.60 2.55 2.15 2.75 2.05 2.70 2.25 2.55 2.05 3.65 3.05 3.00 3.50 3.75 4.85 4.50 4.50 3.35 4.90 then how can i retrieve the rows for all those values at once. if i name x=xt and y=yt and then result<-subset(df,x==xt & y==yt) then i get result [1] x y Height <0 rows> (or 0-length row.names) i dont understand why zero rows are selected. obviously im applying the vectors inappropriately, but i cant seem to find anything on this method of subsetting online. Thanks for any replies! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting multiple rows of a data frame at once
Hi, carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) dim(carbon.fit) #[1] 251001 2 xtNew<-sprintf("%.2f",xt) ytNew<- sprintf("%.2f",yt) carbon.fit[]<- lapply(carbon.fit,function(x) sprintf("%.2f",x)) res<-do.call(rbind,lapply(seq_along(xtNew),function(i) subset(carbon.fit,x==xtNew[i]&y==ytNew[i]))) nrow(res) #[1] 28 res # x y #12631 1.05 0.25 #5296 2.85 0.10 #45431 3.40 0.90 #12951 4.25 0.25 #52631 0.25 1.05 #85476 3.05 1.70 #103076 3.70 2.05 #145311 0.20 2.90 #117766 0.30 2.35 #130331 0.70 2.60 #127861 1.05 2.55 #107836 1.20 2.15 #137916 1.40 2.75 #102896 1.90 2.05 #135541 2.70 2.70 #113051 3.25 2.25 #128111 3.55 2.55 #103166 4.60 2.05 #183071 2.05 3.65 #153021 2.15 3.05 #150671 3.70 3.00 #175836 4.85 3.50 #188366 4.90 3.75 #243146 1.60 4.85 #225696 2.45 4.50 #225771 3.20 4.50 #168226 3.90 3.35 #245936 4.45 4.90 A.K. From: Shaun ♥ Anika To: "smartpink...@yahoo.com" Sent: Thursday, July 4, 2013 12:08 AM Subject: RE: Subsetting multiple rows of a data frame at once Hi There, i can give you the data needed to perform this task... library(akima) library(fields) xt<- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, 1.20, 1.40, 1.90, 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, 4.45) yt<- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, 2.15, 2.75, 2.05, 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, 4.90) xs<- c(0.45, 1.05, 2.75, 3.30, 4.95, 0.40, 1.05, 2.30, 3.45, 4.60, 0.05, 1.95, 2.95, 3.70, 4.55, 0.75, 1.60, 2.10, 3.60, 4.90, 0.05, 1.35, 2.60, 3.40, 4.25) ys<- c(0.45, 0.95, 0.75, 0.95, 0.10, 1.90, 1.45, 1.25, 1.45, 1.05, 2.85, 2.60, 2.05, 2.60, 2.55, 3.75, 3.30, 3.95, 3.45, 3.70, 4.95, 4.35, 4.55, 4.40, 4.95) carbon<- c(1.43, 1.82, 1.40, 1.43, 1.96, 1.61, 1.91, 1.53, 1.17, 1.83, 2.43, 2.02, 1.66, 2.45, 2.46, 1.39, 1.10, 1.38, 1.91, 2.13, 1.88, 1.26, 2.15, 1.89, 1.69) carbon.df=data.frame(x=xs,y=ys,z=carbon) carbon.loess= loess(z~x*y, data= carbon.df, degree= 2) carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) z=predict(carbon.loess, newdata= carbon.fit) carbon.fit$Height=as.numeric(z) image.plot(seq(0,5,0.01,), seq(0,5,0.01), z, xlab = "", ylab="",main = "Carbon") trees<-do.call(rbind,lapply(seq_along(xt),function(i) subset(carbon.fit,x==xt[i]&y==yt[i]))) ## xt is 28 integers long and when i run the above code it only returns the values of 18 out of the 28 (xt,yt) pairs that i want. thanks for your help!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to choose dates data?
Hi, You could try: day<-as.Date(c("2008-04-12","2011-07-02","2011-09-02","2008-04-12","2008-04-12")) indx<-gsub("-.*","",day) day[indx>="2007" & indx<="2009"] #[1] "2008-04-12" "2008-04-12" "2008-04-12" #or library(xts) xt1<- xts(seq_along(day),day) index(xt1["2007/2009"]) #[1] "2008-04-12" "2008-04-12" "2008-04-12" #or library(chron) yr1<-month.day.year(unclass(day))$year day[yr1>=2007 & yr1<=2009] #[1] "2008-04-12" "2008-04-12" "2008-04-12" A.K. - Original Message - From: Gallon Li To: r-help Cc: Sent: Thursday, July 4, 2013 2:31 AM Subject: [R] how to choose dates data? i have converted my data into date format like below: > day=as.Date(originaldate,"%m/%d/%Y") > day[1:5] [1] "2008-04-12" "2011-07-02" "2011-09-02" "2008-04-12" "2008-04-12" I wish to select only those observations from 2007 to 2009, how can I select from this list? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help on selecting values of an object
Hi, You could use: d1<- data.frame(a,b) k1<-data.frame(a=k) library(plyr) join(k1,d1,by="a")[,2] #[1] 4 4 6 6 7 7 6 A.K. - Original Message - From: Andras Farkas To: r-help@r-project.org Cc: Sent: Thursday, July 4, 2013 2:09 PM Subject: [R] help on selecting values of an object Dear List, please provide some input on the following: we have a <-c(0,1,2,3) b <-c(4,5,6,7) d <-cbind(a,b) k <-c(0,0,2,2,3,3,2) "k" in this case consists of some values of "d[,1]" in a random sequence. What I am trying to do is to create an object "f" that would have the values of "d[,2]" in it based on "k", and again, "k" here is a vector that consists of some values of "d[,1]". Basically I am trying to match the values in "k" with their corresponding pairs in "d[,2]". So the result should look like: f <-c(4,4,6,6,7,7,6) appreciate your input Andras __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting multiple rows of a data frame at once
Hi Anika, ?merge() is a better solution. To get the row.names intact, you could do: carbon.fit<- within(carbon.fit,{x<-round(x,10);y<- round(y,10)}) #Using Bill's solution dat1<- data.frame(x=round(xt,10),y=round(yt,10)) carbon.fit1<- data.frame(carbon.fit,rNames=row.names(carbon.fit),stringsAsFactors=FALSE) #changed here res1<-merge(dat1,carbon.fit1,by=c("x","y")) row.names(res1)<- res1[,3] res1<- res1[,-3] A.K. - Original Message - From: William Dunlap To: arun ; Shaun ♥ Anika Cc: R help Sent: Thursday, July 4, 2013 8:02 PM Subject: RE: [R] Subsetting multiple rows of a data frame at once > xt<- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, > 1.20, 1.40, 1.90, > 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, > 4.45) > > yt<- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, > 2.15, 2.75, 2.05, > 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, > 4.90) > carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) > trees<-do.call(rbind,lapply(seq_along(xt),function(i) > subset(carbon.fit,x==xt[i]&y==yt[i]))) > > ## xt is 28 integers long and when i run the above code it only returns the > values of 18 > out of the 28 (xt,yt) pairs that i want. You are running into the problem that two different computational methods that give the same result when applied to real numbers often give different results when applied to 64-bit floating point numbers. (In your case you expect seq(0,5,.01) to contain, e.g., the floating point number generate by parsing the string "3.05".) Hence x==y is not true when you expect it to be. Here is where your 18 came from: R> table(xt %in% carbon.fit$x, yt %in% carbon.fit$y) FALSE TRUE FALSE 1 6 TRUE 3 18 Round your number to the nearest 10^-10 and you get > table(round(xt,10) %in% round(carbon.fit$x,10), round(yt,10) %in% round(carbon.fit$y,10)) TRUE TRUE 28 By the way, you may prefer using the merge() function rather than the do.call(rbind,lapply(...))) business. I think the following call to merge will do about what you want (the row names differ - if they are important it is possible to get them with some minor trickery): merge(data.frame(x=xt,y=yt), carbon.fit) (You still want to round your numbers as before.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of arun > Sent: Wednesday, July 03, 2013 10:15 PM > To: Shaun ♥ Anika > Cc: R help > Subject: Re: [R] Subsetting multiple rows of a data frame at once > > Hi, > > carbon.fit = expand.grid(list(x=seq(0, 5, 0.01), y=seq(0, 5, 0.01))) > dim(carbon.fit) > #[1] 251001 2 > > > xtNew<-sprintf("%.2f",xt) > ytNew<- sprintf("%.2f",yt) > carbon.fit[]<- lapply(carbon.fit,function(x) sprintf("%.2f",x)) > res<-do.call(rbind,lapply(seq_along(xtNew),function(i) > subset(carbon.fit,x==xtNew[i]&y==ytNew[i]))) > nrow(res) > #[1] 28 > res > # x y > #12631 1.05 0.25 > #5296 2.85 0.10 > #45431 3.40 0.90 > #12951 4.25 0.25 > #52631 0.25 1.05 > #85476 3.05 1.70 > #103076 3.70 2.05 > #145311 0.20 2.90 > #117766 0.30 2.35 > #130331 0.70 2.60 > #127861 1.05 2.55 > #107836 1.20 2.15 > #137916 1.40 2.75 > #102896 1.90 2.05 > #135541 2.70 2.70 > #113051 3.25 2.25 > #128111 3.55 2.55 > #103166 4.60 2.05 > #183071 2.05 3.65 > #153021 2.15 3.05 > #150671 3.70 3.00 > #175836 4.85 3.50 > #188366 4.90 3.75 > #243146 1.60 4.85 > #225696 2.45 4.50 > #225771 3.20 4.50 > #168226 3.90 3.35 > #245936 4.45 4.90 > A.K. > > > > From: Shaun ♥ Anika > To: "smartpink...@yahoo.com" > Sent: Thursday, July 4, 2013 12:08 AM > Subject: RE: Subsetting multiple rows of a data frame at once > > > > > Hi There, > i can give you the data needed to perform this task... > > library(akima) > library(fields) > > xt<- c(1.05, 2.85, 3.40, 4.25, 0.25, 3.05, 3.70, 0.20, 0.30, 0.70, 1.05, > 1.20, 1.40, 1.90, > 2.70, 3.25, 3.55, 4.60, 2.05, 2.15, 3.70, 4.85, 4.90, 1.60, 2.45, 3.20, 3.90, > 4.45) > > yt<- c(0.25, 0.10, 0.90, 0.25, 1.05, 1.70, 2.05, 2.90, 2.35, 2.60, 2.55, > 2.15, 2.75, 2.05, > 2.70, 2.25, 2.55, 2.05, 3.65, 3.05, 3.00, 3.50, 3.75, 4.85, 4.50, 4.50, 3.35, > 4.90) > > xs<- c(0.45, 1.05, 2.75, 3.30, 4.95, 0.40, 1.05, 2.30, 3.45, 4.60, 0.05, > 1.95, 2.95, 3.70, > 4.55, 0.75, 1.60, 2.10, 3.60, 4.90, 0.05
Re: [R] Filter Dataframe for Alarm for particular column(s).
Hi, May be this helps: If you had showed your solution, it would be easier to compare. res<-data.frame(lapply(sapply(MyDF[,c(2,4)],function(x) {x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1}),`[`,1)) res # TNH BIX #1 3 9 #Speed set.seed(24) MyDFNew<- data.frame(TNH=sample(0:1,1e6,replace=TRUE),BIX=sample(0:1,1e6,replace=TRUE)) system.time(res1<-data.frame(lapply(sapply(MyDFNew,function(x) {x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1}),`[`,1))) # user system elapsed # 0.364 0.000 0.363 res1 # TNH BIX #1 7 2 MyDFNew[1:10,] # TNH BIX #1 0 1 #2 0 0 #3 1 1 #4 1 1 #5 1 0 #6 1 0 #7 0 1 #8 1 1 #9 1 1 #10 0 0 A.K. Hi, Hi here i have a dataframe called MyDF. a<-c(1,1,1,1,1,0,0,0,1,1) b<-c(1,1,0,1,1,0,0,0,1,1) c<-c(1,1,1,1,1,1,1,0,1,1) d<-c(1,1,1,1,1,1,1,1,0,1) MyDF<-data.frame(DWATT=a,TNH=b,CSGV=c,BIX=d) My requirement is, here i need a function - to get for a particular row number(s), when particular column(s) value change from one-to-zero (for the first change). Suppose there is no change is happening then it should return "Zero" For example, Using MyDF, DWATT TNH CSGV BIX 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 1 0 1 1 1 1 Here i want to know, the row number where TNH-column and BIX-column values change happening from one-to-zero for the first time. Note:- Suppose there is no change is happening then it should return "Zero" Answer should be a dataframe with single row. So here answer should return a dataframe like this. TNH BIX -- 3 9 i used some ways to get a solution using loops. But there is a bulk files with bulk rows to process. So performace is most important. Could someone please suggest better ideas ? Thanks, Antony. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Operations on a big data frame
Hi, May be this helps: dat1<- read.table(text=" P1_prom Nom 1 -6.17 Pt_00187 2 -6.17 Pt_00187 3 -6.17 Pt_00187 4 -6.17 Pt_00187 5 -6.17 Pt_00187 6 -6.17 Pt_01418 7 -5.77 Pt_01418 8 -5.37 Pt_01418 9 -4.97 Pt_01418 10 -4.57 Pt_01418 ",sep="",header=TRUE,stringsAsFactors=FALSE) library(zoo) dat1$PT_promMean<-rollmean(dat1$P1_prom,5,fill=NA,align="left") dat1 # P1_prom Nom PT_promMean #1 -6.17 Pt_00187 -6.17 #2 -6.17 Pt_00187 -6.17 #3 -6.17 Pt_00187 -6.09 #4 -6.17 Pt_00187 -5.93 #5 -6.17 Pt_00187 -5.69 #6 -6.17 Pt_01418 -5.37 #7 -5.77 Pt_01418 NA #8 -5.37 Pt_01418 NA #9 -4.97 Pt_01418 NA #10 -4.57 Pt_01418 NA A.K. Hello all, I have a big data frame that looks like this: P1_prom Nom 1 -6.17 Pt_00187 2 -6.17 Pt_00187 3 -6.17 Pt_00187 4 -6.17 Pt_00187 5 -6.17 Pt_00187 6 -6.17 Pt_01418 7 -5.77 Pt_01418 8 -5.37 Pt_01418 9 -4.97 Pt_01418 10 -4.57 Pt_01418 - - - 25000 where Nom represents a point in a map, and P1_prom represents the value of an operation we perfomed on each point (note that we performed 5 repetitions for each point, hence, each point has 5 values). What I am trying to do, with no success, is to create a new column, in which each row corresponds to the mean value of P1_prom for each point. So basically what I need the program to do is to write in the first row of the new column the average of the first five values of P1_prom, in the second row the average of the next five values, and so on. Could anybody guide me on how to do this. Thank you very much, Veronica __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] IF function
Hi, May be this helps. dat1<- read.table(text=" Col1,Col2 High value,9 Low value,0 High value,7 Low value,0 Low value,0 No data,0 High value,8 No data,0 ",sep=",",header=TRUE,stringsAsFactors=FALSE) dat1$Col2[dat1$Col1=="No data"]<- NA dat1 # Col1 Col2 #1 High value 9 #2 Low value 0 #3 High value 7 #4 Low value 0 #5 Low value 0 #6 No data NA #7 High value 8 #8 No data NA A.K. Hello, I am an R novice so excuse me if this is woefully straight forward, but I have tried the help files to no avail. I am trying to identify cells in 1 column with the value of 'No data', so I can change the values in the next column to 'Null'. Currently I am struggling with the data set, as it assigns both 'No data' and 'Low values' as zero which skews my analysis. I've tried a number of different attempts but just get the error unexpected symbol ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geeglm
Hi, Using the example from ?geeglm() summary(gee1)$corr # Estimate Std.err #alpha 0.957 0.00979 A.K. - Original Message - From: nt1006 To: r-help@r-project.org Cc: Sent: Friday, July 5, 2013 9:40 AM Subject: [R] geeglm How to extract the Std.err and the alpha estimated value from the geeglm function in R. -- View this message in context: http://r.789695.n4.nabble.com/geeglm-tp4670936.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset and order
Hi, You could also try ?data.table() x<- read.table(text="a b c 1 2 3 3 3 4 2 4 5 1 3 4 ",sep="",header=TRUE) library(data.table) xt<- data.table(xt) setkey(xt,a) subset(xt,b==3) # a b c #1: 1 3 4 #2: 3 3 4 iord <- order(x$a) subset(x[iord, ], b == 3) # a b c #4 1 3 4 #2 3 3 4 Speed comparison: set.seed(12345) dat1<- as.data.frame(matrix(sample(1:10,3*1e7,replace=TRUE),ncol=3)) colnames(dat1)<-letters[1:3] system.time({ iord <- order(dat1$a) res1<-subset(dat1[iord, ], b == 3) }) # user system elapsed # 6.888 0.296 7.202 dt1<- data.table(dat1) system.time({setkey(dt1,a) resdt1<-subset(dt1,b==3)}) # user system elapsed # 0.72 0.06 0.78 head(resdt1) # a b c #1: 1 3 6 #2: 1 3 4 #3: 1 3 10 #4: 1 3 2 #5: 1 3 9 #6: 1 3 8 head(res1) # a b c #75 1 3 6 #93 1 3 4 #300 1 3 10 #301 1 3 2 #437 1 3 9 #672 1 3 8 A.K. - Original Message - From: Rui Barradas To: Noah Silverman Cc: "R-help@r-project.org" Sent: Friday, July 5, 2013 3:51 PM Subject: Re: [R] Subset and order Hello, If time is one of the problems, precompute an ordered index, and use it every time you want the df sorted. But that would mean you can't do it in a single operation. iord <- order(x$a) subset(x[iord, ], b == 3) Rui Barradas Em 05-07-2013 20:47, Noah Silverman escreveu: > That would work, but is painfully slow. It forces a new sort of the data > with every query. I have 200,000 rows and need almost a hundred queries. > > Thanks, > > -N > > > On Jul 5, 2013, at 12:43 PM, Rui Barradas wrote: > >> Hello, >> >> Maybe like this? >> >> subset(x[order(x$a), ], b == 3) >> >> >> Hope this helps, >> >> Rui Barradas >> >> Em 05-07-2013 20:33, Noah Silverman escreveu: >>> Hello, >>> >>> I have a data frame with several columns. >>> >>> I'd like to select some subset *and* order by another field at the same >>> time. >>> >>> Example: >>> >>> a b c >>> 1 2 3 >>> 3 3 4 >>> 2 4 5 >>> 1 3 4 >>> etc… >>> >>> >>> I want to select all rows where b=3 and then order by a. >>> >>> To subset is easy: x[x$b==3,] >>> To order is easy: x[order(x$a),] >>> >>> Is there a way to do both in a single efficient statement? >>> >>> Thanks, >>> >>> >>> >>> -- >>> Noah Silverman, M.S., C.Phil >>> UCLA Department of Statistics >>> 8117 Math Sciences Building >>> Los Angeles, CA 90095 >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need hep for converting date data in POSIXct
Hi, I am not sure how your dataset looks like. If it is like the one below: (otherwise, please provide a reproducible example using ?dput()) dat1<- read.table(text=" datetime 10/02/2010 02:30 11/02/2010 04:00 14/02/2010 06:30 ",sep="",header=TRUE,stringsAsFactors=FALSE) lst1<-split(dat1,(seq_along(dat1$datetime)-1)%%2+1) dat2<- data.frame(datetime=as.POSIXct(paste(lst1[[1]][,1],lst1[[2]][,1]),format="%d/%m/%Y %H:%M")) str(dat2) #'data.frame': 3 obs. of 1 variable: # $ datetime: POSIXct, format: "2010-02-10 02:30:00" "2010-02-11 04:00:00" ... dat2 # datetime #1 2010-02-10 02:30:00 #2 2010-02-11 04:00:00 #3 2010-02-14 06:30:00 #or data.frame(datetime=as.POSIXct(paste(dat1[seq(1,nrow(dat1),by=2),1], dat1[seq(2,nrow(dat1),by=2),1]),format="%d/%m/%Y %H:%M")) # datetime #1 2010-02-10 02:30:00 #2 2010-02-11 04:00:00 #3 2010-02-14 06:30:00 A.K. Hey everybody, I am a new user of R software. I don't know how I can merge two rows in one. In fact, I have one row with the date(dd/mm/) and another with the time (hh:mm) and I would like to get one row with date time in order to convert to POSIXct. How can I do it?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting coordinates into two
Hi, vec1<- structure(c(. vec1 #[1] -22.576608,17.07859 -24.621739,17.959728 -26.567955,18.134651 #[4] -22.832516,17.183304 -21.980459,16.91328 #43 Levels: -17.394217,15.886574 -17.406994,14.393463 ... -28.017742,18.745594 G1<-sapply(strsplit(as.character(vec1),","),`[`,1) G2<-sapply(strsplit(as.character(vec1),","),`[`,2) G1 #[1] "-22.576608" "-24.621739" "-26.567955" "-22.832516" "-21.980459" A.K. - Original Message - From: Pancho Mulongeni To: "r-help@r-project.org" Cc: Sent: Monday, July 8, 2013 9:49 AM Subject: [R] Splitting coordinates into two Hi users, I have a simple vector of five coordinates in form of ('lat1, long1','lat2,long2',...,'latn,longn') And I would like to create two vectors, one just with the first coordinate G1<-c('lat1,'lat2',..,'latn') G2<-c('long1,'long2',...,'longn') I am trying to apply strsplit(x=g,split=',') on my object g, but it is not working, any help? I struggle to understand how to use the regular expressions. structure(c(32L, 38L, 40L, 34L, 27L), .Label = c("-17.394217,15.886574", "-17.406994,14.393463", "-17.491495,14.992905", "-17.5005,24.274635", "-17.776151,15.765724", "-17.779911,15.699806", "-17.905569,15.977211", "-17.921576,19.758911", "-18.607204,17.166481", "-18.804918,17.046661", "-18.805731,16.940403", "-19.030476,16.467304", "-19.12441,13.616567", "-19.163006,15.916443", "-19.243736,17.710304", "-19.562702,18.11697", "-19.6303,17.342606", "-19.939787,13.013306", "-20.107201,16.154966", "-20.363618,14.965954", "-20.460469,16.652012", "-20.484914,17.233429", "-21.256102,17.869263", "-21.418555,15.949402", "-21.491128,17.853234", "-21.943046,17.363892", "-21.980459,16.91328", "-22.000992,15.582733", "-22.084367,16.750031", "-22.182318,17.072754", "-22.447841,18.962746", "-22.576608,17.07859", "-22.649502,14.532166", "-22.832516,17.183304", "-22.934365,14.521008", "-22.947328,14.508991", "-24.45,15.801086", "-24.621739,17.959728", "-25.460983,19.438198", "-26.567955,18.134651", "-26.645292,15.153944", "-27.915553,17.490921", "-28.017742,18.745594" ), class = "factor") Pancho Mulongeni Research Assistant PharmAccess Foundation 1 Fouché Street Windhoek West Windhoek Namibia Tel: +264 61 419 000 Fax: +264 61 419 001/2 Mob: +264 81 4456 286 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need hep for converting date data in POSIXct
Hi Laila, There is only one column from the dput() output. time1<- structure(list(date str(time1) #'data.frame': 20 obs. of 1 variable: # $ date: Factor w/ 582 levels "01/01/2009 01:58",..: 370 389 390 409 410 429 430 450 451 471 .. time1[,1]<-as.POSIXct(time1[,1],format="%d/%m/%Y %H:%M") head(time1) # date #1 2008-11-20 12:23:00 #2 2008-11-21 00:33:00 #3 2008-11-21 12:29:00 #4 2008-11-22 00:29:00 #5 2008-11-22 12:39:00 #6 2008-11-23 00:50:00 A.K. From: laila Aranda Romero To: arun Sent: Monday, July 8, 2013 4:29 PM Subject: RE: [R] Need hep for converting date data in POSIXct Arun, When I type dput(head(time,20), it appears this: structure(list(date = structure(c(370L, 389L, 390L, 409L, 410L, 429L, 430L, 450L, 451L, 471L, 472L, 491L, 492L, 511L, 512L, 531L, 532L, 549L, 550L, 567L), .Label = c("01/01/2009 01:58", "01/01/2009 13:57", "01/02/2009 03:49", "01/02/2009 15:51", "01/03/2009 04:40", "01/03/2009 16:37", "01/04/2009 04:21", "01/04/2009 16:33", "01/05/2009 04:33", "01/05/2009 16:31", "01/06/2009 03:11", "01/06/2009 15:10", "01/07/2009 02:49", "01/07/2009 14:46", "01/08/2009 02:44", "01/08/2009 14:44", "01/09/2009 01:05", "01/09/2009 13:14", "01/12/2008 00:58", "01/12/2008 12:53", "02/01/2009 02:01", "02/01/2009 13:58", "02/02/2009 03:59", "02/02/2009 15:58", "02/03/2009 04:37", "02/03/2009 16:25", "02/04/2009 04:30", "02/04/2009 16:30", "02/05/2009 04:33", "02/05/2009 16:31", "02/06/2009 02:52", "02/06/2009 14:57", "02/07/2009 02:47", "02/07/2009 14:51", "02/08/2009 02:42", "02/08/2009 14:42", "02/09/2009 01:14", "02/09/2009 13:19", "03/01/2009 01:52", "03/01/2009 13:57", "03/02/2009 03:55", "03/02/2009 15:56", "03/03/2009 04:21", "03/03/2009 16:29", "03/04/2009 04:39", "03/04/2009 16:29", "03/05/2009 04:27", "03/05/2009 16:24", "03/06/2009 02:53", "03/06/2009 14:48", "03/07/2009 02:55", "03/07/2009 14:54", "03/08/2009 02:36", "03/08/2009 14:28", "03/09/2009 01:32", "03/09/2009 13:37", "04/01/2009 01:57", "04/01/2009 13:57", "04/02/2009 03:55", "04/02/2009 15:50", "04/03/2009 04:35", "04/03/2009 16:35", "04/04/2009 04:28", "04/04/2009 16:36", "04/05/2009 04:43", "04/05/2009 16:43", "04/06/2009 02:36", "04/06/2009 14:40", "04/07/2009 02:49", "04/07/2009 14:48", "04/08/2009 02:40", "04/08/2009 14:38", "04/09/2009 01:45", "04/09/2009 13:54", "05/01/2009 02:02", "05/01/2009 14:01", "05/02/2009 03:51", "05/02/2009 15:49", "05/03/2009 04:35", "05/03/2009 16:40", "05/04/2009 04:36", "05/04/2009 16:29", "05/05/2009 04:18", "05/05/2009 16:13", "05/06/2009 02:41", "05/06/2009 14:22", "05/07/2009 02:50", "05/07/2009 14:57", "05/08/2009 02:31", "05/08/2009 14:28", "05/09/2009 02:08", "05/09/2009 14:13", "06/01/2009 01:55", "06/01/2009 13:52", "06/02/2009 03:54", "06/02/2009 15:55", "06/03/2009 04:39", "06/03/2009 16:40", "06/04/2009 04:20", "06/04/2009 16:19", "06/05/2009 03:56", "06/05/2009 15:49", "06/06/2009 02:20", "06/06/2009 14:26", "06/07/2009 03:10", "06/07/2009 15:05", "06/08/2009 02:35", "06/08/2009 14:35", "06/09/2009 02:10", "06/09/2009 14:01", "06/12/2008 12:27", "07/01/2009 01:54", "07/01/2009 13:38", "07/02/2009 03:49", "07/02/2009 15:50", "07/03/2009 04:53", "07/03/2009 16:33", "07/04/2009 04:23", "07/04/2009 16:22", "07/05/2009 03:33", "07/05/2009 15:34", "07/06/2009 02:40", "07/06/2009 14:59", "07/07/2009 02:52", "07/07/2009 14:55", "07/08/2009 02:34", "07/08/2009 14:37", "07/09/2009 01:59", "07/09/2009 13:45", "07/12/2008 00:28", "07/12/2008 12:33", "08/01/2009 01:23", "08/01/2009 13:09", "08/02/2009 03:52", "08/02/2009 15:51&q
Re: [R] regular expression strikes again
Hi, May be this helps: gsub(".*\\w+\\s+(.*)\\s+.*","\\1",test) #[1] "9,36" "9,36" "9,66" "9,66" "9,66" "10,04" "10,04" "10,04" "6,13" #[10] "6,13" "6,13" A.K. - Original Message - From: PIKAL Petr To: r-help Cc: Sent: Tuesday, July 9, 2013 5:45 AM Subject: [R] regular expression strikes again Dear experts in regexpr. I have this dput(test[500:510]) c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3", "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", "RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3") and I want something like this gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510]) [1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13" [11] "6,13" but with 10,04 values instead of 0,04. I tried gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510]) or other variations but without any success. Please help. Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling
Hi, May be this helps: gsub("_"," ",gsub("(.*)_.*","\\1",DATA_names)) #[1] "A ugkg" "S mgkg" "Cl mgkg" sapply(gsub("_"," ",gsub("(.*)_.*","\\1",DATA_names)),f) $`A ugkg` A ~ (mu * g ~ kg^{ -1 }) $`S mgkg` S ~ (mg ~ kg^{ -1 }) $`Cl mgkg` Cl ~ (mg ~ kg^{ -1 }) A.K. - Original Message - From: Shane Carey To: "r-help@r-project.org" Cc: Sent: Tuesday, July 9, 2013 7:20 AM Subject: [R] Labelling Hi, I have the following data as labels: DATA_names<-c("A_ugkg_FA","S_mgkg_XRF" ,"Cl_mgkg_XR") and I need to convert to -1 A (ug kg ) -1 S (mg kg ) -1 Cl (mg kg ) I used the following piece of code to convert the following labels in the past, but cant get it to work for the new labels: f <- function (name) { # add other suffices and their corresponding plotmath expressions to the list env <- list2env(list(mgkg = bquote(mg ~ kg^{-1}), ugkg = bquote(mu * g ~ kg^{-1})), parent = emptyenv()) pattern <- paste0("(", paste(objects(env), collapse="|"), ")") bquoteExpr <- parse(text=gsub(pattern, "~(.(\\1))", name))[[1]] # I use do.call() to work around the fact that bquote's first argument is not evaluated. do.call(bquote, list(bquoteExpr, env)) } The labels in the past were: DATA_names<-c("A_ugkg","S_mgkg" ,"Cl_mgkg") Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling
Hi, Try this: f1<- function(name) { env <- list2env(list(mgkg = bquote(mg ~ kg^{-1}), ugkg = bquote(mu * g ~ kg^{-1})), parent = emptyenv()) pattern <- paste0("(", paste(objects(env), collapse="|"), ")") bquoteExpr<-parse(text=gsub("_"," ",gsub(pattern,"~(.(\\1))~",name)))[[1]] do.call(bquote, list(bquoteExpr, env)) } sapply(DATA_names,f1) $A_ugkg_FA A ~ (mu * g ~ kg^{ -1 }) ~ FA $S_mgkg_XRF S ~ (mg ~ kg^{ -1 }) ~ XRF $Cl_mgkg_XR Cl ~ (mg ~ kg^{ -1 }) ~ XR A.K. From: Shane Carey To: arun Cc: R help Sent: Tuesday, July 9, 2013 8:57 AM Subject: Re: [R] Labelling Initially, I wanted to remove the suffixes, but now I want to end up with the following c("A_ugkg_FA","S_mgkg_XRF" ,"Cl_mgkg_XR") -1 A (ug kg ) FA -1 S (mg kg ) XRF -1 Cl (mg kg ) XR Thanks all On Tue, Jul 9, 2013 at 1:48 PM, arun wrote: Hi, >May be this helps: > > gsub("_"," ",gsub("(.*)_.*","\\1",DATA_names)) >#[1] "A ugkg" "S mgkg" "Cl mgkg" >sapply(gsub("_"," ",gsub("(.*)_.*","\\1",DATA_names)),f) >$`A ugkg` >A ~ (mu * g ~ kg^{ > -1 >}) > >$`S mgkg` >S ~ (mg ~ kg^{ > -1 >}) > >$`Cl mgkg` >Cl ~ (mg ~ kg^{ > -1 >}) > > >A.K. > > >- Original Message - >From: Shane Carey >To: "r-help@r-project.org" >Cc: >Sent: Tuesday, July 9, 2013 7:20 AM >Subject: [R] Labelling > >Hi, > >I have the following data as labels: > >DATA_names<-c("A_ugkg_FA","S_mgkg_XRF" ,"Cl_mgkg_XR") > >and I need to convert to > > > -1 >A (ug kg ) > > -1 >S (mg kg ) > > -1 >Cl (mg kg ) > > >I used the following piece of code to convert the following labels in the >past, but cant get it to work for the new labels: > >f <- function (name) >{ > # add other suffices and their corresponding plotmath expressions to the >list > env <- list2env(list(mgkg = bquote(mg ~ kg^{-1}), > ugkg = bquote(mu * g ~ kg^{-1})), > parent = emptyenv()) > pattern <- paste0("(", paste(objects(env), collapse="|"), ")") > bquoteExpr <- parse(text=gsub(pattern, > "~(.(\\1))", > name))[[1]] > # I use do.call() to work around the fact that bquote's first argument is >not evaluated. > do.call(bquote, list(bquoteExpr, env)) >} > >The labels in the past were: >DATA_names<-c("A_ugkg","S_mgkg" ,"Cl_mgkg") > >Thanks > >-- >Shane > > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > -- Shane __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kruskal.test
Hi, ?kruskal.test() a<- c(2,4,5,2,7) b<- c(2,2,6) c<- c(3,7,9,3) kruskal.test(list(a,b,c)) # # Kruskal-Wallis rank sum test # #data: list(a, b, c) #Kruskal-Wallis chi-squared = 2.003, df = 2, p-value = 0.3673 A.K. Hi I need an expression in R to apply a kruskal.test to this data (for example). a a a a a b b b c c c c 2 4 5 2 7 2 2 6 3 7 9 3 a, b and c could be consider different vectors. How can I apply this test to this data? (probably the data isn't good to this test, but I onlu need the expression). Thank you very much __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replacing part of delimited string with R's regex
Hi You could use: gsub("([[:alnum:]]+-)([[:alnum:]]+-)(.*)","\\1\\2zzz",name) #[1] "hsa-miR-zzz" "hsa-miR-zzz" "hsa-let-zzz" A.K. - Original Message - From: Gundala Viswanath To: "r-h...@stat.math.ethz.ch" Cc: Sent: Wednesday, July 10, 2013 3:02 AM Subject: [R] Replacing part of delimited string with R's regex I have the following list of strings: name <- c("hsa-miR-555p","hsa-miR-519b-3p","hsa-let-7a") What I want to do is for each of the above strings replace the text after second delimiter with "zzz". Yielding: hsa-miR-zzz hsa-miR-zzz hsa-let-zzz What's the way to do it? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kruskal.test
Hi, Please dput() your example dataset. dat1<- read.table(text="a a a a a b b b c c c c 2 4 5 2 7 2 2 6 3 7 9 3 3 3 4 1 6 8 1 3 5 2 6 3",sep="",header=FALSE,stringsAsFactors=FALSE) library(reshape) dat2<-melt(as.data.frame(t(dat1)),id.var="V1")[,-2] kruskal.test(value~V1,data=dat2) # # Kruskal-Wallis rank sum test # #data: value by V1 #Kruskal-Wallis chi-squared = 1.2888, df = 2, p-value = 0.525 #I guess you wanted for each row: lapply(split(dat2,(seq_len(nrow(dat2))-1)%/%ncol(dat1)+1),function(x) kruskal.test(value~V1,data=x)) #$`1` # # Kruskal-Wallis rank sum test # #data: value by V1 #Kruskal-Wallis chi-squared = 2.003, df = 2, p-value = 0.3673 # #$`2` # Kruskal-Wallis rank sum test #data: value by V1 #Kruskal-Wallis chi-squared = 0.1231, df = 2, p-value = 0.9403 A.K. ____ From: Vera Costa To: arun Sent: Wednesday, July 10, 2013 6:38 AM Subject: Re: Kruskal.test Thank you. And if I have a a a a a b b b c c c c 2 4 5 2 7 2 2 6 3 7 9 3 3 3 4 1 6 8 1 3 5 2 6 3 ? How can I apply the test by row? Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filter Dataframe for Alarm for particular column(s).
Hi, You could try ?data.table() to further increase the speed: #Same example: dt2<- data.table(MyDFNew) system.time(resNew<- dt2[,lapply(.SD,function(x) {x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1})][1] ) # user system elapsed # 0.144 0.004 0.148 resNew # TNH BIX #1: 7 2 According to this link (http://stackoverflow.com/questions/9236438/how-do-i-run-apply-on-a-data-table), using for loop should improve the speed Regarding the use of ts() in this case, I am not very sure. A.K. - Original Message - From: R_Antony To: r-help@r-project.org Cc: Sent: Wednesday, July 10, 2013 1:48 AM Subject: Re: [R] Filter Dataframe for Alarm for particular column(s). Hi Arun, Thanks for the solution it really works !. But how can we avoid even lappy() and sappy(). Actually any way to do with ts() ? Thanks, Antony. From: arun kirshna [via R] [mailto:ml-node+s789695n467097...@n4.nabble.com] Sent: Saturday, July 06, 2013 12:54 AM To: Akkara, Antony (GE Power & Water, Non-GE) Subject: Re: Filter Dataframe for Alarm for particular column(s). Hi, May be this helps: If you had showed your solution, it would be easier to compare. res<-data.frame(lapply(sapply(MyDF[,c(2,4)],function(x) {x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1}),`[`,1)) res # TNH BIX #1 3 9 #Speed set.seed(24) MyDFNew<- data.frame(TNH=sample(0:1,1e6,replace=TRUE),BIX=sample(0:1,1e6,replace=TRUE)) system.time(res1<-data.frame(lapply(sapply(MyDFNew,function(x) {x1<-which(c(0,diff(x))<0);x1[length(x1)==0]<-0;x1}),`[`,1))) # user system elapsed # 0.364 0.000 0.363 res1 # TNH BIX #1 7 2 MyDFNew[1:10,] # TNH BIX #1 0 1 #2 0 0 #3 1 1 #4 1 1 #5 1 0 #6 1 0 #7 0 1 #8 1 1 #9 1 1 #10 0 0 A.K. Hi, Hi here i have a dataframe called MyDF. a<-c(1,1,1,1,1,0,0,0,1,1) b<-c(1,1,0,1,1,0,0,0,1,1) c<-c(1,1,1,1,1,1,1,0,1,1) d<-c(1,1,1,1,1,1,1,1,0,1) MyDF<-data.frame(DWATT=a,TNH=b,CSGV=c,BIX=d) My requirement is, here i need a function - to get for a particular row number(s), when particular column(s) value change from one-to-zero (for the first change). Suppose there is no change is happening then it should return "Zero" For example, Using MyDF, DWATT TNH CSGV BIX 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 1 0 1 1 1 1 Here i want to know, the row number where TNH-column and BIX-column values change happening from one-to-zero for the first time. Note:- Suppose there is no change is happening then it should return "Zero" Answer should be a dataframe with single row. So here answer should return a dataframe like this. TNH BIX -- 3 9 i used some ways to get a solution using loops. But there is a bulk files with bulk rows to process. So performace is most important. Could someone please suggest better ideas ? Thanks, Antony. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Filter-Dataframe-for-Alarm-for-particular-column-s-tp4670950p4670970.html To unsubscribe from Filter Dataframe for Alarm for particular column(s)., click here <http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4670950&code=YW50b255LmFra2FyYUBnZS5jb218NDY3MDk1MHwxNTUxOTQzMDI5> . NAML <http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://r.789695.n4.nabble.com/Filter-Dataframe-for-Alarm-for-particular-column-s-tp4670950p4671203.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create new matrix from user-defined function
Hi, You could try: mat1<-matrix(dat3[rowSums(dat3[,2:3])!=dat3[,4],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS")) mat1 # MW_EEsDue_ERRORS #[1,] 1882 #[2,] 1884 #[3,] 1885 A.K. #Let's say I have the following data set: dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885), B_MW_EEsDue1 = c(2, 2, 1, 4, 6), C_MW_EEsDue2 = c(5, 5, 4, 1, 6), D_MW_EEsDueTotal = c(7, 9, 5, 6, 112)) dat3 # A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal # 1 1881 2 5 7 # 2 1882 2 5 9 # 3 1883 1 4 5 # 4 1884 4 1 6 # 5 1885 6 6 112 # I want to: #CREATE A NEW 1-COLUMN MATRIX (of unknown #rows) LISTING ONLY "A"'s WHERE "D != B + C" #THIS COLUMN CAN BE LABELED "MW_EEsDue_ERRORS", and output for this example should be: # MW_EEsDue_ERRORS # 1 1882 # 2 1884 # 3 1885 #What is the best way to do this? Thanks for your time. BNC __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need hep for converting date data in POSIXct
Hi, I guess the error message: > vmask(lat,lon,time,vmax=25) Error en vmask(lat, lon, time, vmax = 25) : objeto 'lat' no encontrado says that you have not defined the object 'lat'. time<-subset(Geo, select =date) time[,1]<- as.POSIXct(time[,1],format="%d/%m/%Y %H:%M") location<- subset(Geo,select=c(lat.comp,long)) time1<- time[,1] lat<- location[,1] long<- location[,2] library(argosfilter) vmask(lat,long,time1,25) #[1] "end_location" "end_location" "not" "not" "end_location" #[6] "end_location" A.K. From: laila Aranda Romero To: arun Sent: Wednesday, July 10, 2013 6:21 PM Subject: RE: [R] Need hep for converting date data in POSIXct Hi, The code: library(argosfilter) setwd("C:/Users/Usuario/Dropbox/Laila Aranda/PUFGRA") Geo = read.table("2370001_PUFGRA_2009_Gough_000_retarded10_both.trj",header=FALSE,sep = ",", col.names= c("type", "date", "secs", "Trans1", "Trans2", "lat.sta", "lat.comp", "long", "dist", "rumbo", "velocidad", "confianza")) View(Geo) location=subset(Geo, select= c(lat.comp,long)) time=subset(Geo, select =c(date)) time[,1]<-as.POSIXct(time[,1],format="%d/%m/%Y %H:%M") vmask(lat,lon,time,vmax=25) The example: library(argosfilter) > setwd("C:/Users/Usuario/Dropbox/LailaAranda/PUFGRA") > Geo = > read.table("2370001_PUFGRA_2009_Gough_000_retarded10_both.trj",header=FALSE,sep = ",", col.names= c("type", "date","secs", "Trans1", "Trans2", "lat.sta", "lat.comp", "long", "dist", "rumbo", "velocidad", "confianza")) > str(Geo) 'data.frame': 582 obs. of 12 variables: $ type : Factor w/ 2 levels "midnight","noon": 2 1 2 1 2 1 2 1 2 1 ... $ date : Factor w/ 582 levels "01/01/2009 01:58",..: 370 389 390 409 410 429 430 450 451 471 ... $ secs : num 39773 39773 39774 39774 39775 ... $ Trans1 : Factor w/ 186 levels "04:06","04:08",..: 14 17 17 16 16 28 28 19 19 15 ... $ Trans2 : Factor w/ 159 levels "00:01","00:03",..: 30 30 28 28 34 34 35 35 36 36 ... $ lat.sta : num -42.7 -39.1 -37.8 -37.9 -41.2 ... $ lat.comp : num -42.7 -40.6 -38.6 -37.9 -39 ... $ long : num 9.31 11.66 10.88 10.72 13.06 ... $ dist : num 0 0 127 45 131 ... $ rumbo : num 0 0 -16.49 -9.64 -57.22 ... $ velocidad: num 0 0 10.64 3.75 10.75 ... $ confianza: int 3 9 9 9 9 6 6 9 9 9 ... > head(Geo) type date secs Trans1 Trans2 lat.sta lat.comp long dist 1 noon 20/11/2008 12:23 39772.52 04:59 19:47 -42.72 -42.72 9.31 0.00 2 midnight 21/11/2008 00:33 39773.02 05:18 19:47 -39.14 -40.63 11.66 0.00 3 noon 21/11/2008 12:29 39773.52 05:18 19:41 -37.82 -38.60 10.88 127.02 4 midnight 22/11/2008 00:29 39774.02 05:17 19:41 -37.86 -37.86 10.72 45.04 5 noon 22/11/2008 12:39 39774.53 05:17 20:00 -41.21 -39.04 13.06 130.78 6 midnight 23/11/2008 00:50 39775.03 05:41 20:00 -36.56 -38.51 16.02 142.06 rumbo velocidad confianza 1 0.00 0.00 3 2 0.00 0.00 9 3 -16.49 10.64 9 4 -9.64 3.75 9 5 -57.22 10.75 9 6 77.07 11.66 6 > location=subset(Geo, select= c(lat.comp,long)) > str(location) 'data.frame': 582 obs. of 2 variables: $lat.comp: num -42.7 -40.6 -38.6 -37.9 -39 ... $long : num 9.31 11.66 10.88 10.72 13.06 ... > head(location) lat.comp long 1 -42.72 9.31 2 -40.63 11.66 3 -38.60 10.88 4 -37.86 10.72 5 -39.04 13.06 6 -38.51 16.02 > time=subset(Geo, select =c(date)) > time[,1]<-as.POSIXct(time[,1],format="%d/%m/%Y %H:%M") > str(time) 'data.frame': 582 obs. of 1 variable: $ date: POSIXct, format: "2008-11-20 12:23:00" "2008-11-21 00:33:00" ... > head(time) date 1 2008-11-20 12:23:00 2 2008-11-21 00:33:00 3 2008-11-21 12:29:00 4 2008-11-22 00:29:00 5 2008-11-22 12:39:00 6 2008-11-23 00:50:00 > vmask(lat,lon,time,vmax=25) Error en vmask(lat, lon, time, vmax = 25) : objeto 'lat' no encontrado __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate time from dates
Hi, May be this helps: dat1<- read.table(text=" ID date 1 4/12/2008 1 4/13/2008 1 5/11/2008 2 3/21/2009 2 4/22/2009 2 8/05/2009 ",sep="",header=TRUE,stringsAsFactors=FALSE) library(mondate) M1<- mondate(dat1[,2]) M2<- mondate("01/01/2008") dat1$month<-as.numeric(abs(floor(MonthsBetween(M1,M2 dat1 # ID date month #1 1 4/12/2008 4 #2 1 4/13/2008 4 #3 1 5/11/2008 5 #4 2 3/21/2009 15 #5 2 4/22/2009 16 #6 2 8/05/2009 20 A.K. - Original Message - From: Gallon Li To: r-help Cc: Sent: Thursday, July 11, 2013 5:56 AM Subject: [R] calculate time from dates My data are from 2008 to 2010, with repeated measures for same subjects. I wish to compute number of months since january 2008. The data are like the following: ID date 1 4/12/2008 1 4/13/2008 1 5/11/2008 2 3/21/2009 2 4/22/2009 2 8/05/2009 ... the date column are in the format "%m/%d/%y". i wish to obtain ID month 1 4 1 4 1 5 2 15 2 16 2 20 ... also, for the same ID with two identical month, I only want to keep the last one. can any expert help with this question? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read a txt file as numeric
Hi, May be this helps: dat1<- read.table(text=" 142,QUANTIZE_CAL_MIN_BAND_10,1 143,QUANTIZE_CAL_MAX_BAND_11,65535 144,QUANTIZE_CAL_MIN_BAND_11,1 145,END_GROUP,MIN_MAX_PIXEL_VALUE 146,GROUP,RADIOMETRIC_RESCALING 147,RADIANCE_MULT_BAND_1,1.2483E-02 148,RADIANCE_MULT_BAND_2,1.2730E-02 ",sep=",",header=FALSE,stringsAsFactors=FALSE,row.names=1) #Assuming that 142, 143, etc are row.names. #You could create a new column with just the numeric values leaving the strings in the 2nd column. dat1$NewCol<-as.numeric(ifelse(grepl("\\d+",dat1[,2]),dat1[,2],NA)) dat1[,2][grepl("\\d+",dat1[,2])]<-NA dat1 # V2 V3 NewCol #142 QUANTIZE_CAL_MIN_BAND_10 1.e+00 #143 QUANTIZE_CAL_MAX_BAND_11 6.5535e+04 #144 QUANTIZE_CAL_MIN_BAND_11 1.e+00 #145 END_GROUP MIN_MAX_PIXEL_VALUE NA #146 GROUP RADIOMETRIC_RESCALING NA #147 RADIANCE_MULT_BAND_1 1.2483e-02 #148 RADIANCE_MULT_BAND_2 1.2730e-02 str(dat1) #'data.frame': 7 obs. of 3 variables: # $ V2 : chr "QUANTIZE_CAL_MIN_BAND_10" "QUANTIZE_CAL_MAX_BAND_11" "QUANTIZE_CAL_MIN_BAND_11" "END_GROUP" ... # $ V3 : chr NA NA NA "MIN_MAX_PIXEL_VALUE" ... # $ NewCol: num 1 65535 1 NA NA ... A.K. Hello, I am relatively new to the R community. I have a .txt file containing the metafile with informations regarding landsat calibration parameters. This contains 2 columns: one with the description of the parameter and the other one with the value of the parameter. The problem is that the column with the values contains also words in some cases, which I believe makes the read.table() read the column not as a numeric value. This is an example of how it looks like: 142 QUANTIZE_CAL_MIN_BAND_10 1 142 143 QUANTIZE_CAL_MAX_BAND_11 65535 143 144 QUANTIZE_CAL_MIN_BAND_11 1 144 145 END_GROUP MIN_MAX_PIXEL_VALUE 145 146 GROUP RADIOMETRIC_RESCALING 146 147 RADIANCE_MULT_BAND_1 1.2483E-02 147 148 RADIANCE_MULT_BAND_2 1.2730E-02 148 149 RADIANCE_MULT_BAND_3 1.1656E-02 149 I need the left column to be read as numeric, does anyone have some good suggestion on how to approach this problem? Thank you in advance. Stefano __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading a list of filenames from a csv file
Hi, Try this: files1<-read.csv("files.csv",header=TRUE,stringsAsFactors=FALSE) str(files1) #'data.frame': 2 obs. of 2 variables: # $ Col1: chr "ANA110915004A_3PERIOD_TmAvg-rdata.csv" "ANA110915006A_3PERIOD_TmAvg-rdata.csv" # $ Col2: chr "Pre-DA" "DA-10^-6" files1 # Col1 Col2 #1 ANA110915004A_3PERIOD_TmAvg-rdata.csv Pre-DA #2 ANA110915006A_3PERIOD_TmAvg-rdata.csv DA-10^-6 #Using some fake data lapply(seq_len(nrow(files1)),function(i) {x1<-read.csv(file=files1[i,1],header=TRUE,sep="",check.names=FALSE);x1[files1[i,2]]}) [[1]] # Pre-DA #1 2 #2 3 #3 6 #4 4 #[[2]] # DA-10^-6 #1 9 #2 14 #3 13 #4 21 Hope this helps. A.K. - Original Message - From: Jannetta Steyn To: r-help Cc: Sent: Thursday, July 11, 2013 9:01 AM Subject: [R] Reading a list of filenames from a csv file What would be the best way to read a list of filenames and headings from a csv file? The CSV file is structured as two columns, with column one being the filename and column two being a heading e.g.: ANA110915004A_3PERIOD_TmAvg-rdata.csv,Pre-DA ANA110915006A_3PERIOD_TmAvg-rdata.csv,DA-10^-6 ANA110915012A_3PERIOD_TmAvg-rdata.csv,DA-10^-4 ANA110915016A_3PERIOD_TmAvg-rdata.csv,Washout I want to be able to open the file using read.csv and use the heading as the header of a graph. Reading the filenames from the directory with list.files() works but then I don't have the headings that go with the file e.g.: filenames<-list.files(pattern="*.csv") for (i in seq_along(filenames)) { con<-read.csv(filenames[i], headers=TRUE, sep=',') } I tried the code below (which I posted in a different thread) but the solutions that people offered me didn't get it to work. The code results in 'Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection # Read filenames from csv file files <- read.csv(file="files.csv",head=FALSE,sep=",") # for each filename read the file for (i in 1:length(files)) { # f becomes the next row inthe file f<-files[i,] # the header to be used for the graph is in column 2 of f head=f[2] par(mfrow=c(4,2)) # the filename to be used is in column 1 of f con<-read.csv(file=f[1], header=TRUE, sep=',') tmp<-con$value2 data<-normalize_js(tmp,-1,1) time<-con$time # run the waveform analyser waveformanalyser(data,time,head) } Regards Jannetta -- === Web site: http://www.jannetta.com Email: janne...@henning.org === [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDA and confidence ellipse
Hi, May be this helps: require(MASS) require(ggplot2) iris.lda<-lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris) datPred<-data.frame(Species=predict(iris.lda)$class,predict(iris.lda)$x) library(ellipse) dat_ell <- data.frame() for(g in levels(datPred$Species)){ dat_ell <- rbind(dat_ell, cbind(as.data.frame(with(datPred[datPred$Species==g,], ellipse(cor(LD1, LD2), scale=c(sd(LD1),sd(LD2)), centre=c(mean(LD1),mean(LD2),Species=g)) } ggplot(datPred, aes(x=LD1, y=LD2, col=Species) ) + geom_point( size = 4, aes(color = Species))+theme_bw()+geom_path(data=dat_ell,aes(x=x,y=y,color=Species),size=1,linetype=2) A.K. Hi, I wish to add confidence ellipse on my LDA result of the iris data set. Therefore: Is there statistical logic to do that as I only wish it to make the species separation more visable? How can I add it to the script below (ggplot): require(MASS) require(ggplot2) iris.lda<-lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris) LD1<-predict(iris.lda)$x[,1] LD2<-predict(iris.lda)$x[,2] ggplot(iris, aes(x=LD1, y=LD2, col=iris$Species) ) + geom_point( size = 4, aes(color = iris$Species))+theme_bw() Could someone please help me. Thank you very much. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDA and confidence ellipse
Hi, No problem. The default should be 0.95 ?ellipse() level: The confidence level of a pairwise confidence region. The default is 0.95, for a 95% region. This is used to control the size of the ellipse being plotted. A vector of levels may be used. A.K. - Original Message - From: Lluis To: r-help@r-project.org Cc: Sent: Thursday, July 11, 2013 3:15 PM Subject: Re: [R] LDA and confidence ellipse Hi, Thanks works like magic. BTW What is the confidence ellipses probability used? -- View this message in context: http://r.789695.n4.nabble.com/LDA-and-confidence-ellipse-tp4671308p4671357.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create new matrix from user-defined function
Hi, Not sure I understand you correctly. I found it easier to index using number than replace it by lengthy column names. You could do it similar to the one below. matNew<-matrix(dat3[rowSums(dat3[c("B_MW_EEsDue1","C_MW_EEsDue2")])!=dat3["D_MW_EEsDueTotal"],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS")) matNew # MW_EEsDue_ERRORS #[1,] 1882 #[2,] 1884 #[3,] 1885 If you have very large dataset, you could also check ?data.table(). library(data.table) dt3<- data.table(dat3) dtNew<-subset(dt3[D_MW_EEsDueTotal!=B_MW_EEsDue1+C_MW_EEsDue2],select=1) dtNew # A_CaseID #1: 1882 #2: 1884 #3: 1885 #Some speed comparisons: set.seed(1254) datTest<- data.frame(A=sample(1000:15000,1e7,replace=TRUE),B= sample(1:10,1e7,replace=TRUE),C=sample(5:15,1e7,replace=TRUE),D=sample(5:25,1e7,replace=TRUE)) system.time(res1<- data.frame(MW_EEsDue_ERRORS=datTest[datTest[[4]] != datTest[[2]]+datTest[[3]],][[1]])) # user system elapsed # 2.256 0.000 2.145 system.time(mat1<-matrix(datTest[rowSums(datTest[,2:3])!=datTest[,4],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS"))) # user system elapsed # 0.756 0.088 0.849 system.time(res2<- data.frame(MW_EEsDue_ERRORS=datTest[addmargins(as.matrix(datTest[,2:3]),2)[,3]!=datTest[,4],1])) # user system elapsed #115.740 0.000 105.778 dtTest<- data.table(datTest) system.time(res3<- subset(dtTest[D!=B+C],select=1)) # user system elapsed # 0.508 0.000 0.477 identical(res1,res2) #[1] TRUE setnames(res3,"A","MW_EEsDue_ERRORS") identical(res1,as.data.frame(res3)) #[1] TRUE A.K. - Original Message - From: bcrombie To: r-help@r-project.org Cc: Sent: Thursday, July 11, 2013 3:54 PM Subject: Re: [R] create new matrix from user-defined function Dan and Arun, thank you very much for your replies. They are both very helpful and I love to get different versions of an answer so I can learn more R code. You both used indexing to refer to the columns needed in the function, but since my real data frame will be much larger I'm assuming I can replace the index numbers with the names of the columns in quotes instead? I'll try this on my own if you're busy with other forum questions. Thanks, again. From: Nordlund, Dan (DSHS/RDA) [via R] [mailto:ml-node+s789695n4671267...@n4.nabble.com] Sent: Wednesday, July 10, 2013 5:46 PM To: Crombie, Burnette N Subject: Re: create new matrix from user-defined function > -Original Message- > From: [hidden email] > [mailto:r-help-bounces@r- > project.org<mailto:r-help-bounces@r-%20%0b%3e%20project.org>] On Behalf Of > bcrombie > Sent: Wednesday, July 10, 2013 12:19 PM > To: [hidden email] > Subject: [R] create new matrix from user-defined function > > #Let's say I have the following data set: > > dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885), > B_MW_EEsDue1 = c(2, 2, 1, 4, 6), > C_MW_EEsDue2 = c(5, 5, 4, 1, 6), > D_MW_EEsDueTotal = c(7, 9, 5, 6, 112)) > dat3 > # A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal > # 1 1881 2 5 7 > # 2 1882 2 5 9 > # 3 1883 1 4 5 > # 4 1884 4 1 6 > # 5 1885 6 6 112 > > # I want to: > #CREATE A NEW 1-COLUMN MATRIX (of unknown #rows) LISTING ONLY "A"'s > WHERE "D > != B + C" > #THIS COLUMN CAN BE LABELED "MW_EEsDue_ERRORS", and output for this > example > should be: > > # MW_EEsDue_ERRORS > # 1 1882 > # 2 1884 > # 3 1885 > > #What is the best way to do this? Thanks for your time. BNC > > Here is one option, there are many others. Only you can decide what is "best". data.frame(MW_EEsDue_ERRORS=dat3[dat3[[4]] != dat3[[2]]+dat3[[3]],][[1]]) Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/create-new-matrix-from-user-defined-function-tp4671250p4671267.html To unsubscribe from create new matrix from user-defined function, click here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=u
Re: [R] create new matrix from user-defined function
Hi BNC, No problem. You could also use ?with() data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[D_MW_EEsDueTotal!=rowSums(cbind(B_MW_EEsDue1,C_MW_EEsDue2))])) # MW_EEsDue_ERRORS #1 1882 #2 1884 #3 1885 A.K. - Original Message - From: "Crombie, Burnette N" To: arun Cc: R help Sent: Thursday, July 11, 2013 4:40 PM Subject: RE: [R] create new matrix from user-defined function You understood me perfectly, and I agree is it easier to index using numbers than names. I'm just afraid if my dataset gets too big I'll mess up which index numbers I'm supposed to be using. "data.table()" looks very useful and a good way to approach the issue. Thanks. I really appreciate your (everyone's) help. BNC -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Thursday, July 11, 2013 4:29 PM To: Crombie, Burnette N Cc: R help Subject: Re: [R] create new matrix from user-defined function Hi, Not sure I understand you correctly. I found it easier to index using number than replace it by lengthy column names. You could do it similar to the one below. matNew<-matrix(dat3[rowSums(dat3[c("B_MW_EEsDue1","C_MW_EEsDue2")])!=dat3["D_MW_EEsDueTotal"],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS")) matNew # MW_EEsDue_ERRORS #[1,] 1882 #[2,] 1884 #[3,] 1885 If you have very large dataset, you could also check ?data.table(). library(data.table) dt3<- data.table(dat3) dtNew<-subset(dt3[D_MW_EEsDueTotal!=B_MW_EEsDue1+C_MW_EEsDue2],select=1) dtNew # A_CaseID #1: 1882 #2: 1884 #3: 1885 #Some speed comparisons: set.seed(1254) datTest<- data.frame(A=sample(1000:15000,1e7,replace=TRUE),B= sample(1:10,1e7,replace=TRUE),C=sample(5:15,1e7,replace=TRUE),D=sample(5:25,1e7,replace=TRUE)) system.time(res1<- data.frame(MW_EEsDue_ERRORS=datTest[datTest[[4]] != datTest[[2]]+datTest[[3]],][[1]])) # user system elapsed # 2.256 0.000 2.145 system.time(mat1<-matrix(datTest[rowSums(datTest[,2:3])!=datTest[,4],1],ncol=1,dimnames=list(NULL,"MW_EEsDue_ERRORS"))) # user system elapsed # 0.756 0.088 0.849 system.time(res2<- data.frame(MW_EEsDue_ERRORS=datTest[addmargins(as.matrix(datTest[,2:3]),2)[,3]!=datTest[,4],1])) # user system elapsed #115.740 0.000 105.778 dtTest<- data.table(datTest) system.time(res3<- subset(dtTest[D!=B+C],select=1)) # user system elapsed # 0.508 0.000 0.477 identical(res1,res2) #[1] TRUE setnames(res3,"A","MW_EEsDue_ERRORS") identical(res1,as.data.frame(res3)) #[1] TRUE A.K. - Original Message - From: bcrombie To: r-help@r-project.org Cc: Sent: Thursday, July 11, 2013 3:54 PM Subject: Re: [R] create new matrix from user-defined function Dan and Arun, thank you very much for your replies. They are both very helpful and I love to get different versions of an answer so I can learn more R code. You both used indexing to refer to the columns needed in the function, but since my real data frame will be much larger I'm assuming I can replace the index numbers with the names of the columns in quotes instead? I'll try this on my own if you're busy with other forum questions. Thanks, again. From: Nordlund, Dan (DSHS/RDA) [via R] [mailto:ml-node+s789695n4671267...@n4.nabble.com] Sent: Wednesday, July 10, 2013 5:46 PM To: Crombie, Burnette N Subject: Re: create new matrix from user-defined function > -Original Message- > From: [hidden email] > [mailto:r-help-bounces@r- > project.org<mailto:r-help-bounces@r-%20%0b%3e%20project.org>] On > Behalf Of bcrombie > Sent: Wednesday, July 10, 2013 12:19 PM > To: [hidden email] > Subject: [R] create new matrix from user-defined function > > #Let's say I have the following data set: > > dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885), > B_MW_EEsDue1 = c(2, 2, 1, 4, 6), > C_MW_EEsDue2 = c(5, 5, 4, 1, 6), > D_MW_EEsDueTotal = c(7, 9, 5, 6, 112)) > dat3 > # A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal # 1 1881 > >2 5 7 # 2 1882 2 5 > >9 # 3 1883 1 4 5 # 4 >1884 4 1 6 # 5 1885 >6 6 112 > > # I want to: > #CREATE A NEW 1-COLUMN MATRIX (of unknown #rows) LISTING ONLY "A"'s > WHERE "D != B + C" > #THIS COLUMN CAN BE LABELED "MW_EEsDue_ERRORS", and output for this > example should be: > > # MW_EEsDue_ERRORS > # 1 1882 > # 2 1884 > # 3 1885 > > #What is the best way to do this? Thanks for your time. BNC >
Re: [R] Help with IF command strings
HI, Try this: set.seed(485) dat1<- as.data.frame(matrix(sample(0:10,26*10,replace=TRUE),ncol=26)) mean(dat1$V21[dat1$V2==1|dat1$V2==0]) #[1] 3.8 #or with(dat1,mean(V21[V2==1|V2==0])) #[1] 3.8 A.K. I have data in 26 columns, I'm trying to get a mean for column 21 only for the participants that are either 0 or 1 in column 2. One of the commands I tried looked something like this mean(data1$V21, if(V2 = 1)) So basically I need to have the program run a mean (and later other forms of analysis) on participants based on their condition. either 0 or 1. Help is greatly appreciated. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with IF command strings
Hi, Not sure I understand your question. Suppose `data1` is your real data, but if the column names are different, change "V21", "V2" by those in the real data. Based on your initial post, the column names seemed to be the same. mean(data1$V21[data1$V2==1|data1$V2==0]) A.K. What values would I substitute by real data. I did everything the way you posted, and I got 3.8 as well. So I'm curious what values I would change to get the mean for the actual data? - Original Message - From: arun To: R help Cc: Sent: Thursday, July 11, 2013 9:21 PM Subject: Re: Help with IF command strings HI, Try this: set.seed(485) dat1<- as.data.frame(matrix(sample(0:10,26*10,replace=TRUE),ncol=26)) mean(dat1$V21[dat1$V2==1|dat1$V2==0]) #[1] 3.8 #or with(dat1,mean(V21[V2==1|V2==0])) #[1] 3.8 A.K. I have data in 26 columns, I'm trying to get a mean for column 21 only for the participants that are either 0 or 1 in column 2. One of the commands I tried looked something like this mean(data1$V21, if(V2 = 1)) So basically I need to have the program run a mean (and later other forms of analysis) on participants based on their condition. either 0 or 1. Help is greatly appreciated. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replicating Rows
Hi, apple<- read.table(text=" Fam.name,Item,AMT.SALE.NET.PROMO,X.CY..QTY.SALE.TOT 9475,Imported Fruits,22110276001,0,436 9499,Imported Fruits,22110277001,0,236 9523,Imported Fruits,22110278001,0,71 ",sep=",",header=TRUE,stringsAsFactors=FALSE) str(apple) #'data.frame': 3 obs. of 4 variables: # $ Fam.name : chr "Imported Fruits" "Imported Fruits" "Imported Fruits" # $ Item : num 2.21e+10 2.21e+10 2.21e+10 # $ AMT.SALE.NET.PROMO: int 0 0 0 # $ X.CY..QTY.SALE.TOT: num 436 236 71 Here, it changed the class of some of the variables. new<-sapply(apple[,-4],rep,apple[,4]) str(as.data.frame(new,stringsAsFactors=FALSE)) #'data.frame': 743 obs. of 3 variables: # $ Fam.name : chr "Imported Fruits" "Imported Fruits" "Imported Fruits" "Imported Fruits" ... # $ Item : chr "22110276001" "22110276001" "22110276001" "22110276001" ... # $ AMT.SALE.NET.PROMO: chr "0" "0" "0" "0" ... new1<-apple[rep(seq_len(nrow(apple)),apple[,4]),-4] row.names(new1)<- 1:nrow(new1) str(new1) #'data.frame': 743 obs. of 3 variables: # $ Fam.name : chr "Imported Fruits" "Imported Fruits" "Imported Fruits" "Imported Fruits" ... # $ Item : num 2.21e+10 2.21e+10 2.21e+10 2.21e+10 2.21e+10 ... # $ AMT.SALE.NET.PROMO: int 0 0 0 0 0 0 0 0 0 0 .. A.K. I try to replicate the rows according to the number of quantity occurred. Its row should be be sum of the quantity. is there any wrong with my code? thanks. apple Fam.name Item AMT.SALE.NET.PROMO X.CY..QTY.SALE.TOT 9475 Imported Fruits 22110276001 0 436 9499 Imported Fruits 22110277001 0 236 9523 Imported Fruits 22110278001 0 71 9552 Imported Fruits 22110306001 0 69 9571 Imported Fruits 22110314001 0 20 9579 Imported Fruits 22110315001 0 80 9604 Imported Fruits 22110317001 0 61 9635 Imported Fruits 22110321001 0 1026 9697 Imported Fruits 22110334001 0 223 9720 Imported Fruits 22110335001 0 214 9744 Imported Fruits 22110336001 0 102 9768 Imported Fruits 22110337001 0 146 9868 Imported Fruits 22110354001 118.8 17 9893 Imported Fruits 22110360001 0 43 9904 Imported Fruits 22110363001 0 49 9920 Imported Fruits 22110364001 0 1 9938 Imported Fruits 22110365001 205.4 33 new<-sapply(apple[,-4],rep,apple[,4]) nrow(new) [1] 33572 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Needing help for excluding vector elements
Hi, Try: set.seed(41) vec1<- sample(1:50,12000,replace=TRUE) tail(vec1,-1000) length(tail(vec1,-1000)) #[1] 11000 A.K. - Original Message - From: Olivier Charansonney To: r-help@r-project.org Cc: Sent: Friday, July 12, 2013 6:06 AM Subject: [R] Needing help for excluding vector elements Hello, R for Dummies. How can I exclude the first 1000 values of a vector (length 12000)? More generally all the values up to the ith? Thanks for your help, Dr Olivier Charansonney Cardiologue Centre Hospitalier Sud-Francilien, Corbeil-Essonnes, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with IF command strings
Hi, Regarding the 2nd issue of mean=3.8 being "too high", could you explain it. #Using the same example: dat1$V21[dat1$V2==1|dat1$V2==0] #[1] 6 2 1 10 0 (6+2+1+10+0)/5 #[1] 3.8 mean(dat1$V21[dat1$V2==1|dat1$V2==0]) #[1] 3.8 About missing data: set.seed(55) dat2<- as.data.frame(matrix(sample(c(NA,0:4),26*10,replace=TRUE),ncol=26)) new example dataset dat2$V2 #[1] 4 NA 0 0 1 3 2 4 2 1 dat2$V21 #[1] NA 3 0 0 2 0 4 0 3 NA (dat2$V2==1|dat2$V2==0) &!is.na(dat2$V2) # [1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE dat2$V21[(dat2$V2==1|dat2$V2==0) &!is.na(dat2$V2)] #[1] 0 0 2 NA mean(dat2$V21[(dat2$V2==1|dat2$V2==0) &!is.na(dat2$V2)],na.rm=TRUE) #[1] 0.667 (0+0+2)/3 #[1] 0.667 If this doesn't solve the problem, please provide a reproducible example using ?dput() ex: dput(head(dataset,20)) A.K. When I enter that formula I get "NA" or NaN" as an answer. I have some missing data, which was entered in as NA, so I'm not sure if that is the problem. Originally I thought I would need to do the entire set of equations you posted, but that gave me 3.8 as a mean, which I know is too high to be the mean for this data set. Thanks - Original Message - From: arun To: R help Cc: Sent: Friday, July 12, 2013 8:21 AM Subject: Re: Help with IF command strings Hi, Not sure I understand your question. Suppose `data1` is your real data, but if the column names are different, change "V21", "V2" by those in the real data. Based on your initial post, the column names seemed to be the same. mean(data1$V21[data1$V2==1|data1$V2==0]) A.K. What values would I substitute by real data. I did everything the way you posted, and I got 3.8 as well. So I'm curious what values I would change to get the mean for the actual data? - Original Message - From: arun To: R help Cc: Sent: Thursday, July 11, 2013 9:21 PM Subject: Re: Help with IF command strings HI, Try this: set.seed(485) dat1<- as.data.frame(matrix(sample(0:10,26*10,replace=TRUE),ncol=26)) mean(dat1$V21[dat1$V2==1|dat1$V2==0]) #[1] 3.8 #or with(dat1,mean(V21[V2==1|V2==0])) #[1] 3.8 A.K. I have data in 26 columns, I'm trying to get a mean for column 21 only for the participants that are either 0 or 1 in column 2. One of the commands I tried looked something like this mean(data1$V21, if(V2 = 1)) So basically I need to have the program run a mean (and later other forms of analysis) on participants based on their condition. either 0 or 1. Help is greatly appreciated. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replace multiple values in vector at once
Hi, library(car) recode(x,"'x'=1;'y'=2;'z'=3") #[1] 1 1 1 2 2 2 3 3 3 #or as.numeric(factor(x)) #[1] 1 1 1 2 2 2 3 3 3 A.K. - Original Message - From: Trevor Davies To: "r-help@r-project.org" Cc: Sent: Friday, July 12, 2013 5:56 PM Subject: Re: [R] replace multiple values in vector at once I always think that replying to your own r-help feels silly but it's good to close these things out. here's my hack solution: x1<-merge(data.frame(A=x),data.frame(A=c('x','y','z'),B=c(1,2,2)),by='A')[,2] Well that works and should for my more complex situation. If anyone has something a little less heavy handed I'd live to hear it. Have a great weekend. On Fri, Jul 12, 2013 at 2:18 PM, Trevor Davies wrote: > > I'm trying to find a function that can replace multiple instances of > values or characters in a vector in a one step operation. As an example, > the vector: > > x <- c(rep('x',3),rep('y',3),rep('z',3)) > > > x > [1] "x" "x" "x" "y" "y" "y" "z" "z" "z" > > I would simply like to replace all of the x's with 1's, y:2 & z:3 (or > other characters). > i.e: > > x > [1] "1" "1" "1" "2" "2" "2" "3" "3" "3" > > Of course, I'm aware of the replace function but this obviously gets a > little unwieldy when there are : > x<-replace(x,x=='x',1) > x<-replace(x,y=='x',2) > x<-replace(x,z=='x',3) > > but I can't figure out how to do it in a one stop operation. My real > needs is more complex obviously. This is one of those seemingly simple > r-operations that should be obvious but I'm coming up empty on this one. > > Thanks for the help. > Trevor > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create new matrix from user-defined function
Hi, One alternative would be to change colnames: colnames(dat3)<-1:4 data.frame(MW_EEsDue_ERRORS=with(dat3,`1`[`4`!=rowSums(cbind(`2`,`3`))])) #MW_EEsDue_ERRORS #1 1882 #2 1884 #3 1885 Also, check these: with(dat3,4) #[1] 4 with(dat3,`4`) #[1] 7 9 5 6 112 with(dat3,7) #[1] 7 with(dat3,`7`) #Error in eval(expr, envir, enclos) : object '7' not found A.K. - Original Message - From: bcrombie To: r-help@r-project.org Cc: Sent: Friday, July 12, 2013 4:45 PM Subject: Re: [R] create new matrix from user-defined function AK, I decided to convert your “with” statement back to index-by-number, and I did look up the ?with help info, but I’m confused about my replacement code below. I got the wrong answer (R didn’t apply the function to my column 1 variable “A_CaseID”). What am I doing wrong? Do I need to change my function code re: index “4” (otherwise known as “D_MW_EEsDueTotal” --- my attempts at that have failed also)? thanks. #this is your correct code > data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[D_MW_EEsDueTotal!=rowSums(cbind(B_MW_EEsDue1,C_MW_EEsDue2))])) # MW_EEsDue_ERRORS #1 1882 #2 1884 #3 1885 #these are my incorrect scripts > data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[4!=rowSums(cbind(2,3))])) # MW_EEsDue_ERRORS #1 1881 #2 1882 #3 1883 #4 1884 #5 1885 > data.frame(MW_EEsDue_ERRORS=with(dat3,dat3[[1]][4!=rowSums(cbind(2,3))])) # MW_EEsDue_ERRORS #1 1881 #2 1882 #3 1883 #4 1884 #5 1885 > data.frame(MW_EEsDue_ERRORS=with(dat3,1[4!=rowSums(cbind(2,3))])) # MW_EEsDue_ERRORS #1 1 Original database: dat3 = data.frame(A_CaseID = c(1881, 1882, 1883, 1884, 1885), B_MW_EEsDue1 = c(2, 2, 1, 4, 6), C_MW_EEsDue2 = c(5, 5, 4, 1, 6), D_MW_EEsDueTotal = c(7, 9, 5, 6, 112)) dat3 # A_CaseID B_MW_EEsDue1 C_MW_EEsDue2 D_MW_EEsDueTotal # 1 1881 2 5 7 # 2 1882 2 5 9 # 3 1883 1 4 5 # 4 1884 4 1 6 # 5 1885 6 6 112 From: arun kirshna [via R] [mailto:ml-node+s789695n4671365...@n4.nabble.com] Sent: Thursday, July 11, 2013 4:55 PM To: Crombie, Burnette N Subject: Re: create new matrix from user-defined function Hi BNC, No problem. You could also use ?with() data.frame(MW_EEsDue_ERRORS=with(dat3,A_CaseID[D_MW_EEsDueTotal!=rowSums(cbind(B_MW_EEsDue1,C_MW_EEsDue2))])) # MW_EEsDue_ERRORS #1 1882 #2 1884 #3 1885 A.K. -- View this message in context: http://r.789695.n4.nabble.com/create-new-matrix-from-user-defined-function-tp4671250p4671445.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multi-condition summing puzzle
Hi, May be this helps: dat1<- read.table(text=" ID county date company 1 x 1 comp1 2 y 1 comp3 3 y 2 comp1 4 y 3 comp1 5 x 2 comp2 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat2<- dat1 dat1$answer<-unsplit(lapply(split(dat1,dat1$county),function(x) do.call(rbind,lapply(seq_len(nrow(x)),function(i) {x1<-x[1:i,]; x2<-table(x1$company)/sum(table(x1$company));sum(x2^2)}))),dat1$county) dat1 # ID county date company answer #1 1 x 1 comp1 1.000 #2 2 y 1 comp3 1.000 #3 3 y 2 comp1 0.500 #4 4 y 3 comp1 0.556 #5 5 x 2 comp2 0.500 #or dat2$answer<-with(dat2,unlist(ave(company,county,FUN=function(x) lapply(seq_along(x),function(i) {x1<-table(x[1:i]);sum((x1/sum(x1))^2)} dat2 # ID county date company answer #1 1 x 1 comp1 1.000 #2 2 y 1 comp3 1.000 #3 3 y 2 comp1 0.500 #4 4 y 3 comp1 0.556 #5 5 x 2 comp2 0.500 A.K. Hi - I have a seemingly complex data summarizing problem that I am having a hard time wrapping my mind around. What I'm trying to do is sum the square of all company market shares in a given county, UP TO that corresponding time. Sum of market share is defined as: Number of company observations/ Total observations. Here is example data and desired answer: ID county datecompany answer 1 x 1 comp1 1 2 y 1 comp3 1 3 y 2 comp1 0.5 4 y 3 comp1 0.6 5 x 2 comp2 0.5 For example, to get the answer for ID 4, we look at county y, dates 1, 2, 3 and sum: [(2/3)comp1]^2 +[(1/3)comp3]^2 = 0.6 I've tried cumsum, but am simply stuck given all of the different conditions. I have a large matrix of data for this with several hundred companies, tens of counties and unique dates. Any help would be extremely appreciated. Thank you, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to set panel data format
Hi, as.integer(dat$COUNTRY) # would be the easiest (Rui's solution). Other options could be also used: library(plyr) as.integer(mapvalues(dat$COUNTRY,levels(dat$COUNTRY),seq(length(levels(dat$COUNTRY) # [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 #or match(dat$COUNTRY,levels(dat$COUNTRY)) # [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 #if `COUNTRY` is not factor dat$COUNTRY<- as.character(dat$COUNTRY) as.integer(mapvalues(dat$COUNTRY,unique(dat$COUNTRY),seq(length(unique(dat$COUNTRY) # [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 #or (if it is sorted already) (seq_along(dat$COUNTRY)-1)%/%as.vector(table(dat$COUNTRY))+1 # [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 A.K. - Original Message - From: Rui Barradas To: serenamas...@gmail.com Cc: 'r-help' Sent: Saturday, July 13, 2013 12:04 PM Subject: Re: [R] How to set panel data format Hello, It's better if you keep this on the list, the odds of getting more and better answers is greater. Inline. Em 13-07-2013 15:38, serenamas...@gmail.com escreveu: > Hi Rui, > thanks for your reply. > > No, my problem isn't one of reshaping. It is just that I want R to know I > have a panel and not just cross sections or time series. > > In other words If I had cross section data: > > COUNTRY YEAR GDP > Albania 1999 3 > Barbados 1999 5 > Congo 1999 1 > Denmark 1999 11 > etc. .. .. > > My ID here is country, but every observation is a new cluster independent of > each other, so I don't care to let R know because the ID is a unique > identifier. > > Whereas if I have a panel > > COUNTRY YEAR GDP > Albania 1999 3 > Albania 2000 3.5 > Albania 2001 3.7 > Albania 2002 4 > Albania 2003 4.5 > Barbados 1999 5 > Barbados 2000 5 > Barbados 2001 5.1 > Barbados 2002 4 > Barbados 2003 3 > Congo 1999 1 > Congo 2000 2 > Congo 2001 2 > Congo 2002 3 > Congo 2003 4 > Denmark 1999 11 > Denmark 2000 12 > Denmark 2001 13 > Denmark 2002 10 > Denmark 2003 10 > etc. .. .. > > How am I going to tell R that Albania is one same ID for all the 5 years I > have in the panel, in other words, Albania has to be identified by the same > number in the "factor" vector which R codes it with. Then Barbados is ID 2 in > all its years, Congo has ID 3 and so on. R already does that, factors are coded as integers: as.integer(dat$COUNTRY) # Albania is 1, etc > In STATA, you sort 'by country year' and the program knows it is a panel of > entities observed more than once over time. But I am not sure how to let R > know the same. > > In practice the reason why it is important to define where a country ends and > where a new begins is because > > 1) if one creates lags of variables and the program doesn't know where the > boundaries between countries are, the lag for the first year of Barbados in > my previous example will be calculated using the last year of Albania, that > is, the preceding country. A way of doing this, equivalent to the previous line of code if the countries are grouped consecutively, is cumsum(c(TRUE, dat$COUNTRY[-nrow(dat)] != dat$COUNTRY[-1L])) > > 2) I need to create countrydummies that take the value of 1 whenever a > country ID is equal to 1, so if Albania has 5 years of observations and each > of the year observations appears with a different ID, the country dummies > will not be created. Instead if Albania has the same country identifier (1) > for all the years in which it is observed, the country dummy will be the same > and ==1 whenever Albania is the country observed I doubt you need to create dummuies, R does it for you when you create a factor. internally, factors are coded as integers, so all you need is to coerce them to integer like I've said earlier. Rui Barradas > > Hope this makes it clearer, > Thanks, > Serena > > _ > Sent from http://r.789695.n4.nabble.com > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simplify a dataframe
Hi, "when the value of Debut of lines i = value Fin of lines i-1" That part is not clear esp. when it is looked upon with the expected output (df2). Also, in your example dataset: df1$contrat[grep("^CDD",df1$contrat)] #[1] "CDD détaché ext. Cirad" "CDD détaché ext. Cirad" "CDD détaché ext. Cirad" #[4] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad" "CDD détaché ext. Cirad" #[7] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad" "CDD détaché ext. Cirad" ##Looks like there are extra spaces in some of them. I guess these are the same df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" I tried this: indx<-as.numeric(interaction(df1[,1:6],drop=FALSE)) df1New<- df1 res2<-unique(within(df1New,{Debut<-ave(seq_along(indx),indx,FUN=function(x) Debut[head(x,1)]);Fin<- ave(seq_along(indx),indx,FUN=function(x) Fin[tail(x,1)])})) row.names(res2)<- 1:nrow(res2) res2[,c(1,2,7:8)] Matricule Nom Debut Fin 1 1 VERON 24/01/1995 31/12/1997 2 6 BENARD 02/02/1995 12/03/1995 3 6 BENARD 13/03/1995 31/01/1996 ###here not correct 4 8 DALNIC 24/01/1995 31/08/1995 5 8 DALNIC 01/09/1995 29/02/2000 6 934 FORNI 26/01/1995 31/08/2001 7 934 FORNI 01/09/2001 31/08/2004 8 934 FORNI 01/09/2004 31/08/2007 9 934 FORNI 01/09/2007 04/09/2012 10 934 FORNI 05/09/2012 31/12/4712 df2[,c(1,2,7:8)] Mat Nom Debut Fin 1 1 VERON 24/01/1995 31/12/1997 2 6 BENARD 02/02/1995 12/03/1995 3 6 BENARD 13/03/1995 30/06/1995 4 6 BENARD 01/01/1996 31/01/1996 #missing this row 5 8 DALNIC 24/01/1995 31/08/1995 6 8 DALNIC 01/09/1995 29/02/2000 7 934 FORNI 26/01/1995 31/08/2001 8 934 FORNI 01/09/2001 31/08/2004 9 934 FORNI 01/09/2004 31/08/2007 10 934 FORNI 01/09/2007 04/09/2012 11 934 FORNI 05/09/2012 31/12/4712 Here, the dates look similar to the ones on df2 except for one row in df2. A.K. - Original Message - From: Arnaud Michel To: R help Cc: Sent: Friday, July 12, 2013 3:45 PM Subject: [R] simplify a dataframe Hello I have the following problem : group the lines of a dataframe when no information change (Matricule, Nom, Sexe, DateNaissance, Contrat, Pays) and when the value of Debut of lines i = value Fin of lines i-1 I can obtain it with a do loop. Is it possible to avoid the loop ? The dataframe initial is df1 dput(df1) structure(list(Matricule = c(1L, 1L, 1L, 6L, 6L, 6L, 6L, 6L, 6L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L, 934L), Nom = c("VERON", "VERON", "VERON", "BENARD", "BENARD", "BENARD", "BENARD", "BENARD", "BENARD", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "DALNIC", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI", "FORNI"), Sexe = c("Féminin", "Féminin", "Féminin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", "Féminin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin", "Masculin"), DateNaissance = c("02/09/1935", "02/09/1935", "02/09/1935", "01/04/1935", "01/04/1935", "01/04/1935", "01/04/1935", "01/04/1935", "01/04/1935", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "19/02/1940", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961", "10/07/1961"), contrat = c("CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDI commun", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDD détaché ext. Cirad", "CDI Détachés Autres", "CDI Détachés Autres", "CDI Détachés Autres", "CDI Détachés Autres", "C
Re: [R] Test for column equality across matrices
I tried it on a slightly bigger dataset: A1 <- matrix(t(expand.grid(1:90, 15, 16)), nrow = 3) B1 <- combn(90, 3) which(is.element(columnsOf(B1), columnsOf(A1))) # [1] 1067 4895 8636 12291 15861 19347 22750 26071 29311 32471 35552 38555 #[13] 41481 which(apply(t(B1),1,paste,collapse="")%in%apply(t(A1),1,paste,collapse="")) # [1] 1067 4895 8636 12291 15861 19347 22750 26071 29311 32471 35552 38555 #[13] 41481 44331 B1[,44331] #[1] 14 15 16 which(apply(t(A1),1,paste,collapse="")=="141516") #[1] 14 B1New<-B1[,!apply(t(B1),1,paste,collapse="")%in%apply(t(A1),1,paste,collapse="")] newB <- B1[ , !is.element(columnsOf(B1), columnsOf(A1))] identical(B1New,newB) #[1] FALSE is.element(B1[,44331],A1[,14]) #[1] TRUE TRUE TRUE B1Sp<-columnsOf(B1) B1Sp[[44331]] #[1] 14 15 16 A1Sp<- columnsOf(A1) A1Sp[[14]] #[1] 14 15 16 is.element(B1Sp[[44331]],A1Sp[[14]]) #[1] TRUE TRUE TRUE A.K. - Original Message - From: William Dunlap To: Thiem Alrik ; "mailman, r-help" Cc: Sent: Saturday, July 13, 2013 1:30 PM Subject: Re: [R] Test for column equality across matrices Try columnsOf <- function(mat) split(mat, col(mat)) newB <- B[ , !is.element(columnsOf(B), columnsOf(A))] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Thiem Alrik > Sent: Saturday, July 13, 2013 6:45 AM > To: mailman, r-help > Subject: [R] Test for column equality across matrices > > Dear list, > > I have two matrices > > A <- matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3) > B <- combn(16, 3) > > Now I would like to exclude all columns from the 560 columns in B which are > identical to > any 1 of the 6 columns in A. How could I do this? > > Many thanks and best wishes, > > Alrik > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test for column equality across matrices
Hi, One way would be: which(apply(t(B),1,paste,collapse="")%in%apply(t(A),1,paste,collapse="")) #[1] 105 196 274 340 395 B[,105] #[1] 1 15 16 B[,196] #[1] 2 15 16 B1<-B[,!apply(t(B),1,paste,collapse="")%in%apply(t(A),1,paste,collapse="")] dim(B1) #[1] 3 555 dim(B) #[1] 3 560 #or B2<-B[,is.na(match(interaction(as.data.frame(t(B))),interaction(as.data.frame(t(A)] identical(B1,B2) #[1] TRUE A.K. - Original Message - From: Thiem Alrik To: "mailman, r-help" Cc: Sent: Saturday, July 13, 2013 9:45 AM Subject: [R] Test for column equality across matrices Dear list, I have two matrices A <- matrix(t(expand.grid(c(1,2,3,4,5), 15, 16)), nrow = 3) B <- combn(16, 3) Now I would like to exclude all columns from the 560 columns in B which are identical to any 1 of the 6 columns in A. How could I do this? Many thanks and best wishes, Alrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "not all duplicated" question
Hi, May be this helps: dat1<- read.table(text=" Country, Pet France, Dog France, Cat France, Dog Canada, Cat Canada, Cat Japan, Dog Japan, Cat Italy, Cat ",sep=",",header=TRUE,stringsAsFactors=FALSE) dat1[with(dat1,as.numeric(ave(Pet,Country,FUN=function(x) length(unique(x)>1,] # Country Pet #1 France Dog #2 France Cat #3 France Dog #6 Japan Dog #7 Japan Cat A.K. - Original Message - From: Vesco Miloushev To: r-help@r-project.org Cc: Sent: Saturday, July 13, 2013 4:12 PM Subject: [R] "not all duplicated" question Hi, I want to select elements which have duplicates by are not all duplicated. Here is what I mean. Suppose I have a two column matrix with columns "Country" and "Pet" Country, Pet -- France, Dog France, Cat France, Dog Canada, Cat Canada, Cat Japan, Dog Japan, Cat Italy, Cat I want to extract all the entries that are duplicated in column "Country" but not ALL duplicated in column "Pet". In this case I want Country, Pet -- France, Dog France, Cat France, Dog Japan, Dog Japan, Cat Notice that I keep France, because not all are duplicated. If there was no entry "France, Cat" then it all of the entries with "France" would be eliminated. Thanks for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix column flip when recycled
library(plyr) M.1[,1:2]<-do.call(rbind,alply(replicate(3,M.2),3,function(x) x)) #or M.1[,1:2]<-matrix(aperm(replicate(3,M.2),c(1,3,2)),ncol=2) A.K. - Original Message - From: Thiem Alrik To: "mailman, r-help" Cc: Sent: Sunday, July 14, 2013 9:48 AM Subject: [R] Matrix column flip when recycled Dear list, I have a matrix M.1 (30x2) into which I would like to paste another matrix M.2 (10x2) three times. However, the columns get flipped in every odd-numbered recycle run. How can I avoid this behavior? M.1 <- matrix(numeric(30*2), ncol = 2) M.2 <- t(combn(1:5, 2)) M.1[, 1:2] <- M.2 Many thanks for help, Alrik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating dummy variables based on conditions
Hi, You could try this: (if I understand it correctly) dat1<- read.table(text=" year id var ans 2010 1 1 1 2010 2 0 0 2010 1 0 1 2010 1 0 1 2011 2 1 1 2011 2 0 1 2011 1 0 0 2011 1 0 0 ",sep="",header=TRUE,stringsAsFactors=FALSE) dat1$newres<-with(dat1,ave(var,id,year,FUN=function(x) any(x==1)*1)) dat1 # year id var ans newres #1 2010 1 1 1 1 #2 2010 2 0 0 0 #3 2010 1 0 1 1 #4 2010 1 0 1 1 #5 2011 2 1 1 1 #6 2011 2 0 1 1 #7 2011 1 0 0 0 #8 2011 1 0 0 0 A.K. - Original Message - From: Anup Nandialath To: r-help@r-project.org Cc: Sent: Sunday, July 14, 2013 7:30 AM Subject: [R] creating dummy variables based on conditions Hello everyone, I have a dataset which includes the first three variables from the demo data below (year, id and var). I need to create the new variable ans as follows If var=1, then for each year (where var=1), i need to create a new dummy ans which takes the value of 1 for all corresponding id's where an instance of one was recorded. Sample data with the output is shown below. year id var ans [1,] 2010 1 1 1 [2,] 2010 2 0 0 [3,] 2010 1 0 1 [4,] 2010 1 0 1 [5,] 2011 2 1 1 [6,] 2011 2 0 1 [7,] 2011 1 0 0 [8,] 2011 1 0 0 Any help on how to achieve this is much appreciated. Thanks Anup [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simplify a dataframe
Hi, May be this helps you. df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" df1[48,8] [1] "31/12/4712" #strange value df1[48,8]<- "31/12/2013" #changed indx<-as.numeric(interaction(df1[,1:6],drop=TRUE)) res<-do.call(rbind,lapply(split(df1,indx),function(x) {x1<- as.Date(x$Debut,format="%d/%m/%Y");x2<- as.Date(x$Fin,format="%d/%m/%Y");do.call(rbind,lapply(split(x,cumsum(c(FALSE,(x1[-1]-x2[-nrow(x)])!=1))),function(x) data.frame(x[1,1:6],Debut=head(x$Debut,1),Fin=tail(x$Fin,1),stringsAsFactors=FALSE)))})) res[order(res$Matricule),] #the order of rows is a bit different than df2. Matricule Nom Sexe DateNaissance contrat Pays 5 1 VERON Féminin 02/09/1935 CDI commun France 4.0 6 BENARD Masculin 01/04/1935 CDI commun France 4.1 6 BENARD Masculin 01/04/1935 CDI commun France 10 6 BENARD Masculin 01/04/1935 CDI commun Philippines 6 8 DALNIC Féminin 19/02/1940 CDI commun France 9 8 DALNIC Féminin 19/02/1940 CDI commun Martinique 1 934 FORNI Masculin 10/07/1961 CDD détaché ext. Cirad Cameroun 2 934 FORNI Masculin 10/07/1961 CDI commun Congo 3 934 FORNI Masculin 10/07/1961 CDI Détachés Autres Congo 7 934 FORNI Masculin 10/07/1961 CDI Détachés Autres France 8 934 FORNI Masculin 10/07/1961 CDI commun Gabon Debut Fin 5 24/01/1995 31/12/1997 4.0 13/03/1995 30/06/1995 4.1 01/01/1996 31/01/1996 10 02/02/1995 12/03/1995 6 24/01/1995 31/08/1995 9 01/09/1995 29/02/2000 1 26/01/1995 31/08/2001 2 05/09/2012 31/12/2013 3 01/09/2004 31/08/2007 7 01/09/2001 31/08/2004 8 01/09/2007 04/09/2012 A.K. From: Arnaud Michel To: arun Cc: R help ; jholt...@gmail.com; Rui Barradas Sent: Sunday, July 14, 2013 12:17 PM Subject: Re: [R] simplify a dataframe Hi, Excuse me for the indistinctness Le 13/07/2013 17:18, arun a écrit : Hi, "when the value of Debut of lines i = value Fin of lines i-1" That part is not clear esp. when it is looked upon with the expected output (df2). I want to group the lines which have the same caracteristics (Matricule, Nom, Sexe, DateNaissance, Contrat, Pays) and with period of time (Debut/start and Fin/end) without interruption of time. For exemple : The following three lines : Debut/Start Fin/End 1 VERON Féminin 02/09/1935 CDI commun France 24/01/1995 30/04/1997 1 VERON Féminin 02/09/1935 CDI commun France 01/05/1997 30/12/1997 1 VERON Féminin 02/09/1935 CDI commun France 31/12/1997 31/12/1997 are transformed into 1 line 1 VERON Féminin 02/09/1935 CDI commun France 24/01/1995 31/12/1997 because same caracteristicsand period of time without interruption of time (from 24/01/1995 to 31/12/1997) The following six lines : 6 BENARD Masculin 01/04/1935 CDI commun Philippines 02/02/1995 27/02/1995 6 BENARD Masculin 01/04/1935 CDI commun Philippines 28/02/1995 28/02/1995 6 BENARD Masculin 01/04/1935 CDI commun Philippines 01/03/1995 12/03/1995 6 BENARD Masculin 01/04/1935 CDI commun France 13/03/1995 30/06/1995 6 BENARD Masculin 01/04/1935 CDI commun France 01/01/1996 30/01/1996 6 BENARD Masculin 01/04/1935 CDI commun France 31/01/1996 31/01/1996 are transformed into 6 BENARD Masculin 01/04/1935 CDI commun Philippines 02/02/1995 12/03/1995 6 BENARD Masculin 01/04/1935 CDI commun France 13/03/1995 30/06/1995 6 BENARD Masculin 01/04/1935 CDI commun France 01/01/1996 31/01/1996 because lines 1-3 identical for caracteristics and without interruption in time lines 4 and lines 5-6 are not grouped because there is an interruption in time beetween 30/06/1995 and 01/01/1996 Thank you for your help Michel Also, in your example dataset: df1$contrat[grep("^CDD",df1$contrat)] #[1] "CDD détaché ext. Cirad" "CDD détaché ext. Cirad" "CDD détaché ext. Cirad" #[4] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad" "CDD détaché ext. Cirad" #[7] "CDD détaché ext. Cirad" "CDD détaché ext.Cirad" "CDD détaché ext. Cirad" ##Looks like there are extra spaces in some of them. I guess these are the same df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" I tried this: indx<-as.numeric(interaction(df1[,1:6],drop=FALSE)) df1New&l
Re: [R] Need hep for converting date data in POSIXct
HI, Try this: Geo<- read.table(text=" long lat.comp confianza 9.31 -42.72 3 11.66 -40.63 9 10.88 -38.60 9 10.72 -37.86 9 13.06 -39.04 9 16.02 -38.51 6 ",sep="",header=TRUE) col1<- as.numeric(factor(Geo$confianza)) with(Geo, plot(long,lat.comp,col=col1)) A.K. From: laila Aranda Romero To: arun Sent: Sunday, July 14, 2013 3:28 PM Subject: RE: [R] Need hep for converting date data in POSIXct Arun, I contact you again because I have another difficulty with R. I posted the following message but it hasn't been accepted by the fórum filter. So I'm not sure if you can see it I have the following database: head(Geo) long lat.comp confianza 9.31 -42.72 3 11.66 -40.63 9 10.88 -38.60 9 10.72 -37.86 9 13.06 -39.04 9 16.02 -38.51 6 I am trying to plot Geo$ long versus Geo$lat.comp with diferent colours regarding the number of Geo$confianza. I don't know how to make the palette and tell R to plot the points using this palette in the same graph. Regards, Laila > Date: Thu, 11 Jul 2013 03:10:40 -0700 > From: smartpink...@yahoo.com > Subject: Re: [R] Need hep for converting date data in POSIXct > To: laila_...@hotmail.com > > Hi Laila, > No problem. > Regards, > Arun > > > > > - Original Message - > From: laila > To: r-help@r-project.org > Cc: > Sent: Thursday, July 11, 2013 3:38 AM > Subject: Re: [R] Need hep for converting date data in POSIXct > > Arun, the last email has been sent it by itself. I have just found the > problem and it works. Thank very much > > Date: Wed, 10 Jul 2013 19:36:43 -0700 > From: ml-node+s789695n4671274...@n4.nabble.com > To: laila_...@hotmail.com > Subject: Re: Need hep for converting date data in POSIXct > > > > > > Hi, > > I guess the error message: > > > vmask(lat,lon,time,vmax=25) > > Error en vmask(lat, lon, > > time, vmax = 25) : objeto 'lat' no encontrado > > > says that you have not defined the object 'lat'. > > > time<-subset(Geo, select =date) > > time[,1]<- as.POSIXct(time[,1],format="%d/%m/%Y %H:%M") > > location<- subset(Geo,select=c(lat.comp,long)) > > time1<- time[,1] > > lat<- location[,1] > > long<- location[,2] > > library(argosfilter) > > vmask(lat,long,time1,25) > > #[1] "end_location" "end_location" "not" "not" > "end_location" > > #[6] "end_location" > > > A.K. > > > > From: laila Aranda Romero <[hidden email]> > > To: arun <[hidden email]> > > Sent: Wednesday, July 10, 2013 6:21 PM > > Subject: RE: [R] Need hep for converting date data in POSIXct > > > > > > > Hi, > > > The code: > > > library(argosfilter) > > setwd("C:/Users/Usuario/Dropbox/Laila Aranda/PUFGRA") > > Geo = > > read.table("2370001_PUFGRA_2009_Gough_000_retarded10_both.trj",header=FALSE,sep > > = ",", col.names= c("type", "date", > > "secs", "Trans1", "Trans2", > > "lat.sta", "lat.comp", "long", > > "dist", "rumbo", "velocidad", > > "confianza")) > > View(Geo) > > location=subset(Geo, select= c(lat.comp,long)) > > time=subset(Geo, select =c(date)) > > time[,1]<-as.POSIXct(time[,1],format="%d/%m/%Y > > %H:%M") > > vmask(lat,lon,time,vmax=25) > > > > > > The example: library(argosfilter) > > > setwd("C:/Users/Usuario/Dropbox/LailaAranda/PUFGRA") > > > Geo = > > read.table("2370001_PUFGRA_2009_Gough_000_retarded10_both.trj",header=FALSE,sep > > = ",", col.names= c("type", "date","secs", "Trans1", "Trans2", "lat.sta", > "lat.comp", "long", "dist", "rumbo", "velocidad", "confianza")) > > > str(Geo) > > > 'data.frame': 582 > > obs. of 12 variables: $ > > type : Factor w/ 2 levels > > "midnight","noon": 2 1 2 1 2 1 2 1 2 1 ... > > $ > > date : Factor w/ 582 levels > > "01/01/2009 01:58",..: 370 389 390 409 410 429 430 450 451 471 ... > > >
Re: [R] simplify a dataframe
HI Michel, This gives the same order as that of df2. df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" df1[48,8]<- "31/12/2013" indx<-as.numeric(interaction(df1[,1:6],drop=TRUE)) lst1<-split(df1,indx) lst2<-lst1[match(unique(indx),names(lst1))] res<-do.call(rbind,lapply(lst2,function(x){x1<- as.Date(x$Debut,format="%d/%m/%Y");x2<- as.Date(x$Fin,format="%d/%m/%Y");do.call(rbind,lapply(split(x,cumsum(c(FALSE,(x1[-1]-x2[-nrow(x)])!=1))),function(x) data.frame(x[1,1:6],Debut=head(x$Debut,1),Fin=tail(x$Fin,1),stringsAsFactors=FALSE)))})) row.names(res)<- 1:nrow(res) df2[11,8]<- "31/12/2013" names(res)[1]<- "Mat" identical(res,df2) #[1] TRUE A.K. - Original Message - From: arun To: Arnaud Michel Cc: R help Sent: Sunday, July 14, 2013 2:39 PM Subject: Re: [R] simplify a dataframe Hi, May be this helps you. df1$contrat[grep("^CDD",df1$contrat)]<- "CDD détaché ext. Cirad" df1[48,8] [1] "31/12/4712" #strange value df1[48,8]<- "31/12/2013" #changed indx<-as.numeric(interaction(df1[,1:6],drop=TRUE)) res<-do.call(rbind,lapply(split(df1,indx),function(x) {x1<- as.Date(x$Debut,format="%d/%m/%Y");x2<- as.Date(x$Fin,format="%d/%m/%Y");do.call(rbind,lapply(split(x,cumsum(c(FALSE,(x1[-1]-x2[-nrow(x)])!=1))),function(x) data.frame(x[1,1:6],Debut=head(x$Debut,1),Fin=tail(x$Fin,1),stringsAsFactors=FALSE)))})) res[order(res$Matricule),] #the order of rows is a bit different than df2. Matricule Nom Sexe DateNaissance contrat Pays 5 1 VERON Féminin 02/09/1935 CDI commun France 4.0 6 BENARD Masculin 01/04/1935 CDI commun France 4.1 6 BENARD Masculin 01/04/1935 CDI commun France 10 6 BENARD Masculin 01/04/1935 CDI commun Philippines 6 8 DALNIC Féminin 19/02/1940 CDI commun France 9 8 DALNIC Féminin 19/02/1940 CDI commun Martinique 1 934 FORNI Masculin 10/07/1961 CDD détaché ext. Cirad Cameroun 2 934 FORNI Masculin 10/07/1961 CDI commun Congo 3 934 FORNI Masculin 10/07/1961 CDI Détachés Autres Congo 7 934 FORNI Masculin 10/07/1961 CDI Détachés Autres France 8 934 FORNI Masculin 10/07/1961 CDI commun Gabon Debut Fin 5 24/01/1995 31/12/1997 4.0 13/03/1995 30/06/1995 4.1 01/01/1996 31/01/1996 10 02/02/1995 12/03/1995 6 24/01/1995 31/08/1995 9 01/09/1995 29/02/2000 1 26/01/1995 31/08/2001 2 05/09/2012 31/12/2013 3 01/09/2004 31/08/2007 7 01/09/2001 31/08/2004 8 01/09/2007 04/09/2012 A.K. From: Arnaud Michel To: arun Cc: R help ; jholt...@gmail.com; Rui Barradas Sent: Sunday, July 14, 2013 12:17 PM Subject: Re: [R] simplify a dataframe Hi, Excuse me for the indistinctness Le 13/07/2013 17:18, arun a écrit : Hi, "when the value of Debut of lines i = value Fin of lines i-1" That part is not clear esp. when it is looked upon with the expected output (df2). I want to group the lines which have the same caracteristics (Matricule, Nom, Sexe, DateNaissance, Contrat, Pays) and with period of time (Debut/start and Fin/end) without interruption of time. For exemple : The following three lines : Debut/Start Fin/End 1 VERON Féminin 02/09/1935 CDI commun France 24/01/1995 30/04/1997 1 VERON Féminin 02/09/1935 CDI commun France 01/05/1997 30/12/1997 1 VERON Féminin 02/09/1935 CDI commun France 31/12/1997 31/12/1997 are transformed into 1 line 1 VERON Féminin 02/09/1935 CDI commun France 24/01/1995 31/12/1997 because same caracteristicsand period of time without interruption of time (from 24/01/1995 to 31/12/1997) The following six lines : 6 BENARD Masculin 01/04/1935 CDI commun Philippines 02/02/1995 27/02/1995 6 BENARD Masculin 01/04/1935 CDI commun Philippines 28/02/1995 28/02/1995 6 BENARD Masculin 01/04/1935 CDI commun Philippines 01/03/1995 12/03/1995 6 BENARD Masculin 01/04/1935 CDI commun France 13/03/1995 30/06/1995 6 BENARD Masculin 01/04/1935 CDI commun France 01/01/1996 30/01/1996 6 BENARD Masculin 01/04/1935 CDI commun France 31/01/1996 31/01/1996 are transformed into 6 BENARD Masculin 01/04/1935 CDI commun Philippines 02/02/1995 12/03/1995 6 BENARD Masculin 01/04/1935 CDI commun France 13/03/1995 30/06/1995 6 BENARD Masculin 01/04/1935 CDI commun Franc
Re: [R] t-test across columns
Hi, Not sure about the format for the 2nd part. df1<- ##data library(plyr) df2<-ddply(df1,.(name,cat),summarize, cbind(t.test(val,df1$val)$statistic,t.test(val,df1$val)$p.value)) df3<-cbind(df2[,1:2],data.frame(df2[,3])) colnames(df3)[3:4]<- c("t-val","p.val") library(reshape2) df3m<- melt(df3,id.var=c("name","cat")) xtabs(value~name+cat+variable,data=df3m) , , variable = t-val cat name p178266580 p178269196 p178316310 p191287337 p195158904 12.2 -1.1697701975 -5.2812696387 -1.2740973341 2.1926665883 0.1529759080 15.9 -2.5063901671 0.00 -0.2169806106 1.5455008954 -1.6574358795 cat name p196921846 p197427158 p238921966 12.2 0.2260409495 -0.3320635130 3.3659689025 15.9 6.6278680348 0.00 0.00 , , variable = p.val cat name p178266580 p178269196 p178316310 p191287337 p195158904 12.2 0.3092408498 0.0003382099 0.3762474897 0.0419925673 0.8812900356 15.9 0.0147796276 0.00 0.8365830321 0.1822041450 0.1096087365 cat name p196921846 p197427158 p238921966 12.2 0.8226135494 0.7435688987 0.0071990164 15.9 0.0005489640 0.00 0.00 #or res<-dcast(df3m,name~cat+variable,value.var="value") row.names(res)<- res[,1] res1<- res[,-1] res1 p178266580_t-val p178266580_p.val p178269196_t-val p178269196_p.val 12.2 -1.16977 0.30924085 -5.28127 0.0003382099 15.9 -2.50639 0.01477963 NA NA p178316310_t-val p178316310_p.val p191287337_t-val p191287337_p.val 12.2 -1.2740973 0.3762475 2.192667 0.04199257 15.9 -0.2169806 0.8365830 1.545501 0.18220414 p195158904_t-val p195158904_p.val p196921846_t-val p196921846_p.val 12.2 0.1529759 0.8812900 0.2260409 0.822613549 15.9 -1.6574359 0.1096087 6.6278680 0.000548964 p197427158_t-val p197427158_p.val p238921966_t-val p238921966_p.val 12.2 -0.3320635 0.7435689 3.365969 0.007199016 15.9 NA NA NA NA A.K. - Original Message - From: Nico Met To: R help Cc: Sent: Monday, July 15, 2013 11:50 AM Subject: [R] t-test across columns Dear all, I would like to do t-test across two columns "name" with different "cat" with overall mean ("val"). (Removing if there is a single observation) And finally, make a matrix with t-value and p-value associated with a name (in rows) and cat (in columns) dput(x) structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("12.2", "15.9" ), class = "factor"), cat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("p178266580", "p178269196", "p178316310", "p191287337", "p195158904", "p196921846", "p197427158", "p238921966"), class = "factor"), val = c(148.90772, 184.253375, 183.97486667, 191.868125, 173.30515, 187.876975, 177.453775, 184.799525, 212.39065, 205.504525, 186.152025, 194.337075, 193.2703, 204.71665, 211.4452, 202.609175, 203.72918, 193.7261, 196.1186, 202.79556, 203.48818, 191.13744, 205.23315, 198.66842, 196.81032, 200.90512, 206.13564, 205.372225, 196.22835, 211.04686, 219.9771, 224.7602, 231.6596, 211.10581667, 215.44474, 210.83514, 228.173125, 224.09034, 212.96026, 239.0085, 213.5407, 227.12115, 209.24888, 232.8964, 232.22146, 228.1643, 236.43082, 232.20792, 238.49192, 224.64014, 233.75898, 207.06138, 215.3649, 211.14802, 201.86854, 200.52278, 199.05752, 194.90904, 214.44334, 249.35726667, 239.98525, 234.50848333, 243.86508333, 233.59581667, 248.1219, 225.28941667, 248.22088333, 193.69566, 198.43578, 205.06055, 208.525975, 198.28692, 206.88496, 201.60162, 205.7943, 210.5117, 196.69886, 193.58288, 198.86094, 201.81676, 225.8266, 205.879725, 218.370475, 214.006125, 198.74038, 206.00314, 198.37446, 225.5357, 216.721025, 226.543925, 158.1011, 158.15674, 166.07518, 179.942225, 158.16046, 165.0685, 159.56146 )), .Names = c("name", "cat", "val"), class = "data.frame", row.names = c( NA, 97L)) Thanks Nico [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.eth
Re: [R] t-test across columns
Hi, May be I misunderstood ur question. The output David got could be also obtained by: #df1 dataset library(plyr) df2<-ddply(df1,.(cat),function(x) if(min(table(x$name))>1){x1<- t.test(val~name,x);cbind(t=x1$statistic,p.value=x1$p.value)}) df2 # cat t p.value #1 p178266580 -0.1156475 0.9144054453 #2 p178316310 -1.0874356 0.4143944591 #3 p191287337 -0.6776053 0.5315717871 #4 p195158904 1.1522850 0.2769290728 #5 p196921846 -4.2342996 0.0003925339 But, the second part is still unclear. A.K. - Original Message - From: David Carlson To: 'Nico Met' ; 'R help' Cc: Sent: Monday, July 15, 2013 1:33 PM Subject: Re: [R] t-test across columns This may be close to what you want: > t.val <- by(x, x$cat, function(y) if (min(table(y$name)>1)) { + t.test(val~name, y)}) > t.out <- do.call(rbind, sapply(t.val, function(y) c(y$statistic, + p.value=y$p.value))) > t.out t p.value p178266580 -0.1156475 0.9144054453 p178316310 -1.0874356 0.4143944591 p191287337 -0.6776053 0.5315717871 p195158904 1.1522850 0.2769290728 p196921846 -4.2342996 0.0003925339 But I'm not sure what you mean about columns for each cat unless you want the frequencies: > freq.out <- xtabs(~cat+name, x) > freq.out <- freq.out[apply(freq.out, 1, function(y) min(y) > 1),] > freq.out name cat 12.2 15.9 p178266580 4 11 p178316310 2 3 p191287337 3 5 p195158904 8 7 p196921846 26 5 > results <- cbind(freq.out, t.out) > results 12.2 15.9 t p.value p178266580 4 11 -0.1156475 0.9144054453 p178316310 2 3 -1.0874356 0.4143944591 p191287337 3 5 -0.6776053 0.5315717871 p195158904 8 7 1.1522850 0.2769290728 p196921846 26 5 -4.2342996 0.0003925339 - David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Nico Met Sent: Monday, July 15, 2013 10:50 AM To: R help Subject: [R] t-test across columns Dear all, I would like to do t-test across two columns "name" with different "cat" with overall mean ("val"). (Removing if there is a single observation) And finally, make a matrix with t-value and p-value associated with a name (in rows) and cat (in columns) dput(x) structure(list(name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("12.2", "15.9" ), class = "factor"), cat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("p178266580", "p178269196", "p178316310", "p191287337", "p195158904", "p196921846", "p197427158", "p238921966"), class = "factor"), val = c(148.90772, 184.253375, 183.97486667, 191.868125, 173.30515, 187.876975, 177.453775, 184.799525, 212.39065, 205.504525, 186.152025, 194.337075, 193.2703, 204.71665, 211.4452, 202.609175, 203.72918, 193.7261, 196.1186, 202.79556, 203.48818, 191.13744, 205.23315, 198.66842, 196.81032, 200.90512, 206.13564, 205.372225, 196.22835, 211.04686, 219.9771, 224.7602, 231.6596, 211.10581667, 215.44474, 210.83514, 228.173125, 224.09034, 212.96026, 239.0085, 213.5407, 227.12115, 209.24888, 232.8964, 232.22146, 228.1643, 236.43082, 232.20792, 238.49192, 224.64014, 233.75898, 207.06138, 215.3649, 211.14802, 201.86854, 200.52278, 199.05752, 194.90904, 214.44334, 249.35726667, 239.98525, 234.50848333, 243.86508333, 233.59581667, 248.1219, 225.28941667, 248.22088333, 193.69566, 198.43578, 205.06055, 208.525975, 198.28692, 206.88496, 201.60162, 205.7943, 210.5117, 196.69886, 193.58288, 198.86094, 201.81676, 225.8266, 205.879725, 218.370475, 214.006125, 198.74038, 206.00314, 198.37446, 225.5357, 216.721025, 226.543925, 158.1011, 158.15674, 166.07518, 179.942225, 158.16046, 165.0685, 159.56146 )), .Names = c("name", "cat", "val"), class = "data.frame", row.names = c( NA, 97L)) Thanks Nico [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] file.stem?
Hi, May be this also works. basename(file_path_sans_ext("/the/path/to/afile.txt")) #[1] "afile" A.K. - Original Message - From: Rui Barradas To: Witold E Wolski Cc: r-help@r-project.org Sent: Monday, July 15, 2013 10:32 AM Subject: Re: [R] file.stem? Hello, You can use ?basename to write a file.stem function: basename("/the/path/to/afile.txt") file.stem <- function(x){ bn <- basename(x) gsub("\\..*$", "", bn) } file.stem("/the/path/to/afile.txt") Hope this helps, Rui Barradas Em 15-07-2013 15:23, Witold E Wolski escreveu: > Looking for a function which returns the stem of the filename given a path. > i.e. >> file.stem("/the/path/to/afile.txt") >> afile > > regards > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting numeric to character and using character pattern
HI Irucka, May be this is what you wanted: pat<-paste(paste0("http://www.",siter[,1],"..rdb";),collapse="|") pat [1] "http://www.02437100..rdb|http://www.02439500..rdb|http://www.02441500..rdb|http://www.02446500..rdb|http://www.02467000..rdb| A.K. Hi, I am having a problem with my data set and conversion from numeric to character. Below is my code with comments on the specific problem below: hydraulicsites <- read.table("hydraulic_geometry_sites.csv", header = TRUE, sep = "\t", as.is = TRUE, stringsAsFactors = FALSE, colClasses = c("character",NA)) siter <- hydraulicsites[1] dput(siter) structure(list(site_no = c("02437100", "02439500", "02441500", "02446500", "02467000", "02470050", "03217500", "03219500", "03220510", "03227500", "03230700", "03231500", "03455000", "03497000", "03439000", "03439500", "0344", "03441000", "03454500", "03479000", "03513500", "04177500", "04183500", "04185000", "04185500", "04186500", "04187500", "04188000", "04189000", "04189500", "0419", "04191500", "04192500", "04193500", "06191500", "06214500", "06218500", "06222000", "06225500", "06228000", "06235500", "06259500", "06262000", "06264000", "06266000", "06269500", "06273000", "06276500", "06277500", "06279500", "06287000", "06288500", "06288500", "06289000", "06290500", "06293500", "06294700", "06329500", "0631", "06309500", "06312500", "06311000", "06311500", "06313000", "06313500", "06315500", "06314000", "06315000", "06316500", "06317000", "06317500", "06318500", "06319500", "0632", "06320500", "06323000", "06323500", "06324000", "06325500", "06324500", "06326500", "06426500", "06428000", "06428500", "06436000", "06437000", "06438000", "06821500", "0683", "06850500", "06856600", "0686", "06862500", "06864000", "06864500", "06865500", "06866000", "06877600", "06879500", "06887500", "06889000", "06891000", "06892500", "06342500", "0644", "06818000", "06893000", "06934500", "05587500", "0701", "07032000", "07289000")), .Names = "site_no", class = "data.frame", row.names = c(NA, -112L)) pat <- paste("http://www.";, siter, "..rdb", sep="", collapse = "|") str(pat) chr "www.c(\"02437100\", \"02439500\", \"02441500\", \"02446500\", \"02467000\""| __truncated__ OK, the problem is with pat. I need for pat to be the same as patter. I have a list of sites in .csv files that I need to process so I would like a more efficient way of doing the process than is shown below. Is there a way to get the results in pat to resemble those in patter? sites3 <- c("07103990", "402114105350101", "05056215") patter <- paste("www.", sites3, "..rdb", sep="", collapse = "|") dput(patter) "www.07103990..rdb|www.402114105350101..rdb|www.05056215..rdb" Thank you. Irucka Embry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting numeric to character and using character pattern
#or pat1<-paste("http://www.";, siter[,1], "..rdb", sep="", collapse = "|") identical(pat,pat1) #[1] TRUE A.K. - Original Message - From: arun To: Irucka Embry Cc: R help Sent: Monday, July 15, 2013 2:47 PM Subject: Re: converting numeric to character and using character pattern HI Irucka, May be this is what you wanted: pat<-paste(paste0("http://www.",siter[,1],"..rdb";),collapse="|") pat [1] "http://www.02437100..rdb|http://www.02439500..rdb|http://www.02441500..rdb|http://www.02446500..rdb|http://www.02467000..rdb| A.K. Hi, I am having a problem with my data set and conversion from numeric to character. Below is my code with comments on the specific problem below: hydraulicsites <- read.table("hydraulic_geometry_sites.csv", header = TRUE, sep = "\t", as.is = TRUE, stringsAsFactors = FALSE, colClasses = c("character",NA)) siter <- hydraulicsites[1] dput(siter) structure(list(site_no = c("02437100", "02439500", "02441500", "02446500", "02467000", "02470050", "03217500", "03219500", "03220510", "03227500", "03230700", "03231500", "03455000", "03497000", "03439000", "03439500", "0344", "03441000", "03454500", "03479000", "03513500", "04177500", "04183500", "04185000", "04185500", "04186500", "04187500", "04188000", "04189000", "04189500", "0419", "04191500", "04192500", "04193500", "06191500", "06214500", "06218500", "06222000", "06225500", "06228000", "06235500", "06259500", "06262000", "06264000", "06266000", "06269500", "06273000", "06276500", "06277500", "06279500", "06287000", "06288500", "06288500", "06289000", "06290500", "06293500", "06294700", "06329500", "0631", "06309500", "06312500", "06311000", "06311500", "06313000", "06313500", "06315500", "06314000", "06315000", "06316500", "06317000", "06317500", "06318500", "06319500", "0632", "06320500", "06323000", "06323500", "06324000", "06325500", "06324500", "06326500", "06426500", "06428000", "06428500", "06436000", "06437000", "06438000", "06821500", "0683", "06850500", "06856600", "0686", "06862500", "06864000", "06864500", "06865500", "06866000", "06877600", "06879500", "06887500", "06889000", "06891000", "06892500", "06342500", "0644", "06818000", "06893000", "06934500", "05587500", "0701", "07032000", "07289000")), .Names = "site_no", class = "data.frame", row.names = c(NA, -112L)) pat <- paste("http://www.";, siter, "..rdb", sep="", collapse = "|") str(pat) chr "www.c(\"02437100\", \"02439500\", \"02441500\", \"02446500\", \"02467000\""| __truncated__ OK, the problem is with pat. I need for pat to be the same as patter. I have a list of sites in .csv files that I need to process so I would like a more efficient way of doing the process than is shown below. Is there a way to get the results in pat to resemble those in patter? sites3 <- c("07103990", "402114105350101", "05056215") patter <- paste("www.", sites3, "..rdb", sep="", collapse = "|") dput(patter) "www.07103990..rdb|www.402114105350101..rdb|www.05056215..rdb" Thank you. Irucka Embry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Deleting specific rows from a dataframe
Hi, If I understand it correctly, df1<- read.table(text=" sample1 sample2 sample3 sample4 sample5 a P P I P P b P A P P A c P P P P P d P P P P P e M P M A P f P P P P P g P P P A P h P P P P P ",sep="",header=TRUE,stringsAsFactors=FALSE) df1[rowSums(df1=="P")==ncol(df1),] # sample1 sample2 sample3 sample4 sample5 #c P P P P P #d P P P P P #f P P P P P #h P P P P P A.K. - Original Message - From: Chirag Gupta To: r-help@r-project.org Cc: Sent: Monday, July 15, 2013 9:10 PM Subject: [R] Deleting specific rows from a dataframe I have a data frame like shown below sample1 sample2 sample3 sample4 sample5 a P P I P P b P A P P A c P P P P P d P P P P P e M P M A P f P P P P P g P P P A P h P P P P P I want to keep only those rows which have all "P" across all the columns. Since the matrix is large (about 20,000 rows), I cannot do it in excel Any special function that i can use? -- *Chirag Gupta* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Deleting specific rows from a dataframe
You mentioned data.frame at one place and matrix at another. Matrix would be faster. #Speed comparison #Speed set.seed(1454) dfTest<- as.data.frame(matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5)) system.time(res<-dfTest[rowSums(dfTest=="P")==ncol(dfTest),]) # user system elapsed # 0.628 0.020 0.649 dim(res) #[1] 952 5 set.seed(1454) mat1<- matrix(sample(LETTERS[15:18],5*1e6,replace=TRUE),ncol=5) system.time(res1<-mat1[rowSums(mat1=="P")==ncol(mat1),]) # user system elapsed # 0.188 0.004 0.194 dim(res1) #[1] 952 5 #Other options include system.time(res3<- dfTest[apply(sweep(dfTest,1,"P","=="),1,all),]) # user system elapsed # 5.988 0.120 6.120 identical(res,res3) #[1] TRUE system.time(res2<- dfTest[apply(dfTest,1, function(x) all(length(table(x))==ncol(dfTest) | names(table(x))=="P") ), ]) # user system elapsed #351.492 0.040 352.164 row.names(res2)<- row.names(res3) attr(res3,"row.names")<- attr(res2,"row.names") identical(res2,res3) #[1] TRUE A.K. - Original Message - From: arun To: Chirag Gupta Cc: R help Sent: Monday, July 15, 2013 9:23 PM Subject: Re: [R] Deleting specific rows from a dataframe Hi, If I understand it correctly, df1<- read.table(text=" sample1 sample2 sample3 sample4 sample5 a P P I P P b P A P P A c P P P P P d P P P P P e M P M A P f P P P P P g P P P A P h P P P P P ",sep="",header=TRUE,stringsAsFactors=FALSE) df1[rowSums(df1=="P")==ncol(df1),] # sample1 sample2 sample3 sample4 sample5 #c P P P P P #d P P P P P #f P P P P P #h P P P P P A.K. - Original Message - From: Chirag Gupta To: r-help@r-project.org Cc: Sent: Monday, July 15, 2013 9:10 PM Subject: [R] Deleting specific rows from a dataframe I have a data frame like shown below sample1 sample2 sample3 sample4 sample5 a P P I P P b P A P P A c P P P P P d P P P P P e M P M A P f P P P P P g P P P A P h P P P P P I want to keep only those rows which have all "P" across all the columns. Since the matrix is large (about 20,000 rows), I cannot do it in excel Any special function that i can use? -- *Chirag Gupta* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (1 - 0.7) == 0.3
HI, 2-0.7==0.3 #[1] FALSE ##May be u meant 2-0.7==1.3 #[1] TRUE Possibly R FAQ 7.31 Also, check http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy all.equal(2-0.7,1.3) #[1] TRUE all.equal(1-0.7,0.3) #[1] TRUE (1-0.7)<(0.3+.Machine$double.eps^0.5) #[1] TRUE p <- c(0.2, 0.4, 0.6, 0.8, 1) round((1-p)*5,1)+1 #[1] 5 4 3 2 1 In your second example, p <- c(0.8, 0.6, 0.4, 0.2, 0) floor((1 - p) * 5) + 1 #[1] 1 3 4 5 6 ((1-0.8)*5) +1 #[1] 2 round((1-p)*5,1)+1 #[1] 2 3 4 5 6 A.K. ...is false :( However (2 - 0.7) == 0.3 is true. Is there any way to get around this? The end goal is for this to work: p <- c(0.2, 0.4, 0.6, 0.8, 1) floor((1 - p) * 5) + 1 > 5 4 3 1 1 whereas the correct result would have been 5 4 3 2 1. If I set p <- c(0.8, 0.6, 0.4, 0.2, 0) then it works as expected. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Errors using large numbers ((i) all entries of 'x' must be nonnegative and finite and (ii) NAs introduced by coercion)
HI, ?as.integer() #documentation Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: ‘double’s can hold much larger integers exactly. as.numeric(c(75533, 4756922556, 88210, 6715122129)) #[1] 75533 4756922556 88210 6715122129 #or as.double(c(75533, 4756922556, 88210, 6715122129)) #[1] 75533 4756922556 88210 6715122129 A.K. - Original Message - From: PIKAL Petr To: jgibbons1 ; "r-help@r-project.org" Cc: Sent: Tuesday, July 16, 2013 12:54 PM Subject: Re: [R] Errors using large numbers ((i) all entries of 'x' must be nonnegative and finite and (ii) NAs introduced by coercion) Well, You could find it yourself, as.integer(c(75533, 4756922556, 88210, 6715122129)) [1] 75533 NA 88210 NA Warning message: NAs introduced by coercion > matrix(c(75533, 4756922556, 88210, 6715122129), nrow=2) [,1] [,2] [1,] 75533 88210 [2,] 4756922556 6715122129 Using as.integer inputs NA as integer type has limited size. Petr > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of jgibbons1 > Sent: Tuesday, July 16, 2013 4:44 PM > To: r-help@r-project.org > Subject: [R] Errors using large numbers ((i) all entries of 'x' must be > nonnegative and finite and (ii) NAs introduced by coercion) > > Hello, > I am fairly new to R, so please forgive me if this is a fairly easy > solution. > > I am trying to perform multiple Fisher's Exact tests or Pearson's Chi- > squared contingency tests from a datamatrix in which data from each row > is data for an independent test. > > My data is formatted as such: > > AAA 75533 4756922556 88210 6715122129 > BBB 14869 4756983220 16384 6715193955 > CCC 7230 4756990859 8559 6715201780 > DDD 18332 4756979757 23336 6715187003 > EEE 14733 4756983356 16826 6715193513 > FFF 2918 4756995171 3433 6715206906 > GGG 3726 4756994363 4038 6715206301 > HHH 6196 4756991893 7011 6715203328 > III 7925 4756990164 9130 6715201209 > JJJ 1434 4756996655 1602 6715208737 > > Where the 1st column is the identifier, the 2nd column = observations > 1, the 3rd column = background counts 1, the 4th column = observations > 2 and the 5th column = background counts 2. > > I am loading my data as such: > > > data=read.table("My.File", header=FALSE) > > And I am looping through each row to perform a test like this: > > > pvalues=c("pvalue") > > for(i in 1:10){ > + datamatrix=matrix(c(as.integer(data[i,2:5])),nrow=2) > + fisherresult=fisher.test(datamatrix) > + pvalues=cbind(pvalues,fisherresult[1]) > + } > > Here is the Error I am Getting: > > Error in fisher.test(datamatrix) : > all entries of 'x' must be nonnegative and finite In addition: > Warning messages: > 1: In matrix(c(as.integer(data[i, 2:5])), nrow = 2) : > NAs introduced by coercion > 2: In matrix(c(as.integer(data[i, 2:5])), nrow = 2) : > NAs introduced by coercion > > > When I replace the large number in the 3rd and 5th column with smaller > numbers, the statistical calculation works fine. > > Any ideas? Any help would be GREATLY appreciated! > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Errors- > using-large-numbers-i-all-entries-of-x-must-be-nonnegative-and-finite- > and-ii-NAs-introduced-b-tp4671685.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove attributes from scale() in a matrix?
HI, Try: x1<-scale(x,center=TRUE,scale=TRUE) str(x1) # num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ... # - attr(*, "scaled:center")= num [1:10] 50.2 50 49.8 49.8 50.3 ... #- attr(*, "scaled:scale")= num [1:10] 1.109 0.956 0.817 0.746 1.019 ... attr(x1,"scaled:center")<-NULL attr(x1,"scaled:scale")<-NULL str(x1) #num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ... A.K. - Original Message - From: C W To: r-help Cc: Sent: Tuesday, July 16, 2013 3:59 PM Subject: [R] How to remove attributes from scale() in a matrix? Hi list, I am using scale() to standardize a distribution? But why does it give me attributes attached to the data? I just want a standardized matrix, that is all. library(mvtnorm) > x <- rmvnorm(15, mean=rep(50, 10)) > x [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,] 51.17519 52.34341 49.63084 47.99234 51.63113 50.91391 49.36819 49.23901 51.17377 [2,] 50.57039 49.17210 48.64395 49.03940 49.65761 49.93840 49.94883 50.69044 49.57632 [3,] 50.64811 50.21503 50.13786 49.15879 48.51550 50.19444 50.23710 50.98040 51.37032 [4,] 49.22797 49.66445 49.93287 48.63681 50.49457 50.33302 52.29552 49.98424 51.04724 [5,] 49.72099 50.84510 50.60976 49.60883 53.59509 49.14728 50.23134 49.09141 49.23780 [6,] 49.49126 50.90938 49.67140 50.08951 49.79854 49.03711 50.26037 50.24975 48.26958 [7,] 51.12384 47.92778 50.60112 49.01554 49.47515 50.12756 51.65216 49.21998 49.63808 [8,] 51.45123 50.44037 50.01039 50.27511 49.97658 51.63002 50.37156 50.02685 48.95423 [9,] 51.16989 50.16200 51.17724 50.71678 50.79565 50.27128 51.05608 49.61165 47.81732 [10,] 49.54263 49.93501 49.71762 49.33378 51.44935 51.53775 50.54346 49.98333 49.59422 [11,] 51.16497 49.82914 49.08821 51.02918 49.67663 49.53498 50.26647 49.48569 50.94504 [12,] 51.16827 50.50244 49.13003 49.00155 50.26457 48.85465 49.11593 50.58031 51.14926 [13,] 48.26216 49.94866 48.62526 49.11995 50.40082 49.25359 48.57677 50.66760 49.44108 [14,] 49.82530 49.17352 50.05588 50.51265 51.04926 50.32474 49.78180 50.48349 49.92431 [15,] 50.55772 49.84691 47.95021 50.24911 49.85335 50.73062 51.48718 51.36693 50.18307 [,10] [1,] 50.13859 [2,] 51.54920 [3,] 49.23230 [4,] 50.92683 [5,] 50.97708 [6,] 50.78799 [7,] 50.53913 [8,] 49.30832 [9,] 49.43606 [10,] 49.42060 [11,] 50.21002 [12,] 51.94848 [13,] 49.41352 [14,] 52.24064 [15,] 51.19474 > scale(x, center=TRUE, scale=TRUE) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 0.8890317 2.3390090 -0.040395734 -1.86089754 1.00159470 0.92533476 -0.99715965 [2,] 0.2452502 -0.9109703 -1.190404546 -0.63771097 -0.66104313 -0.21446975 -0.40514793 [3,] 0.3279747 0.1578297 0.550427419 -0.49823662 -1.62323564 0.08468695 -0.11121941 [4,] -1.1837031 -0.4064112 0.311551281 -1.10802250 0.04407804 0.24660932 1.98754311 [5,] -0.6589074 0.8035298 1.100314901 0.02749734 2.65618150 -1.13883336 -0.11709623 [6,] -0.9034419 0.8694088 0.006865424 0.58904255 -0.54231158 -1.26755243 -0.08749646 [7,] 0.8343705 -2.1861602 1.090250934 -0.66558751 -0.81476108 0.00655050 1.33157578 [8,] 1.1828615 0.3887665 0.401888014 0.80585326 -0.39231482 1.76205433 0.02587038 [9,] 0.8833860 0.1034854 1.761589113 1.32181395 0.29773018 0.17447101 0.72381232 [10,] -0.8487569 -0.1291363 0.060728488 -0.29381647 0.84844800 1.65423826 0.20113824 [11,] 0.8781560 -0.2376361 -0.672712386 1.68676651 -0.64501776 -0.68583461 -0.08127647 [12,] 0.8816611 0.4523675 -0.623990804 -0.68193230 -0.14968994 -1.48074503 -1.25436403 [13,] -2.2117715 -0.1151423 -1.212190165 -0.54361597 -0.03490809 -1.01461386 -1.80409304 [14,] -0.5478718 -0.9095198 0.454889973 1.08335795 0.51138447 0.23693367 -0.57544865 [15,] 0.2317608 -0.2194204 -1.998811911 0.77548830 -0.49613484 0.71117025 1.16336204 [,8] [,9] [,10] [1,] -1.2666791 1.17934107 -0.35061189 [2,] 0.8423439 -0.28600703 1.06389274 [3,] 1.2636749 1.35963155 -1.25940610 [4,] -0.1838111 1.06327078 0.43980595 [5,] -1.4811512 -0.59653247 0.49019839 [6,] 0.2019965 -1.48467432 0.30058779 [7,] -1.2943281 -0.22935616 0.05103467 [8,] -0.1218950 -0.85664924 -1.18316969 [9,] -0.7252082 -1.89953387 -1.05507780 [10,] -0.1851315 -0.26958168 -1.07058564 [11,] -0.9082374 0.96952049 -0.27897948 [12,] 0.6823136 1.15685832 1.46428187 [13,] 0.8091562 -0.41005981 -1.07768018 [14,] 0.5416286 0.03320871 1.75724879 [15,] 1.8253278 0.27056365 0.70846058 attr(,"scaled:center") [1] 50.33999 50.06102 49.66551 49.58529 50.44225 50.12196 50.34618 50.11074 49.88811 [10] 50.48823 attr(,"scaled:scale") [1] 0.9394453 0.9757930 0.8581604 0.8560117 1.1869812 0.8558562 0.9807762 0.6882016 [9] 1.0901550 0.9972455 Also, > attributes(x) <- NULL will not work since this is matrix not vector. Thanks, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-pr
Re: [R] How to remove attributes from scale() in a matrix?
Hi Mike, If you check ?scale For ‘scale.default’, the centered, scaled matrix. The numeric centering and scalings used (if any) are returned as attributes ‘"scaled:center"’ and ‘"scaled:scale"’ By checking the source code: methods(scale) getAnywhere('scale.default') function (x, center = TRUE, scale = TRUE) { x <- as.matrix(x) nc <- ncol(x) if (is.logical(center)) { if (center) { center <- colMeans(x, na.rm = TRUE) x <- sweep(x, 2L, center, check.margin = FALSE) } } else if (is.numeric(center) && (length(center) == nc)) x <- sweep(x, 2L, center, check.margin = FALSE) else stop("length of 'center' must equal the number of columns of 'x'") if (is.logical(scale)) { if (scale) { f <- function(v) { v <- v[!is.na(v)] sqrt(sum(v^2)/max(1, length(v) - 1L)) } scale <- apply(x, 2L, f) x <- sweep(x, 2L, scale, "/", check.margin = FALSE) } } else if (is.numeric(scale) && length(scale) == nc) x <- sweep(x, 2L, scale, "/", check.margin = FALSE) else stop("length of 'scale' must equal the number of columns of 'x'") if (is.numeric(center)) attr(x, "scaled:center") <- center if (is.numeric(scale)) attr(x, "scaled:scale") <- scale x } #You can comment out the last few lines: scale1<- function (x, center = TRUE, scale = TRUE) { x <- as.matrix(x) nc <- ncol(x) if (is.logical(center)) { if (center) { center <- colMeans(x, na.rm = TRUE) x <- sweep(x, 2L, center, check.margin = FALSE) } } else if (is.numeric(center) && (length(center) == nc)) x <- sweep(x, 2L, center, check.margin = FALSE) else stop("length of 'center' must equal the number of columns of 'x'") if (is.logical(scale)) { if (scale) { f <- function(v) { v <- v[!is.na(v)] sqrt(sum(v^2)/max(1, length(v) - 1L)) } scale <- apply(x, 2L, f) x <- sweep(x, 2L, scale, "/", check.margin = FALSE) } } else if (is.numeric(scale) && length(scale) == nc) x <- sweep(x, 2L, scale, "/", check.margin = FALSE) else stop("length of 'scale' must equal the number of columns of 'x'") #if (is.numeric(center)) # attr(x, "scaled:center") <- center #if (is.numeric(scale)) # attr(x, "scaled:scale") <- scale x } x2<-scale1(x,center=TRUE,scale=TRUE) str(x2) # num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ... identical(x1,x2) #[1] TRUE A.K. - Original Message - From: C W To: arun Cc: R help Sent: Tuesday, July 16, 2013 6:58 PM Subject: Re: [R] How to remove attributes from scale() in a matrix? Arun, thanks for the quick response. That helps. Why does scale() give attributes? What's the point of that? I don't see apply() or any similar functions do it. Just for my curiosity. Mike On Tue, Jul 16, 2013 at 4:07 PM, arun wrote: > HI, > Try: > x1<-scale(x,center=TRUE,scale=TRUE) > str(x1) > # num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ... > # - attr(*, "scaled:center")= num [1:10] 50.2 50 49.8 49.8 50.3 ... > #- attr(*, "scaled:scale")= num [1:10] 1.109 0.956 0.817 0.746 1.019 ... > > attr(x1,"scaled:center")<-NULL > attr(x1,"scaled:scale")<-NULL > str(x1) > #num [1:15, 1:10] -0.2371 -0.5606 -0.8242 1.5985 -0.0164 ... > A.K. > > > > > - Original Message - > From: C W > To: r-help > Cc: > Sent: Tuesday, July 16, 2013 3:59 PM > Subject: [R] How to remove attributes from scale() in a matrix? > > Hi list, > > I am using scale() to standardize a distribution? But why does it > give me attributes attached to the data? I just want a standardized > matrix, that is all. > > library(mvtnorm) >> x <- rmvnorm(15, mean=rep(50, 10)) >> x > [,1] [,2] [,3] [,4] [,5] [,6] [,7] > [,8] [,9] > [1,] 51.17519 52.34341 49.63084 47.99234 51.63113 50.91391 49.36819 > 49.23901 51.17377 > [2,] 50.57039 49.17210 48.64395 49.03940 49.65761 49.93840 49.94883 > 50.69044 49.57632 > [3,] 50.64811 50.21503 50.13786 49.15879 48.51550 50.19444 50.23710 > 50.98040 51.37032 > [4,] 49.22797 49.66445 49.93287 48.63681 50.49457 50.33302 52.29552 > 49.98424 51.04724 > [5,] 49.72099 50.84510 50.60976 49.60883 53.59509 49.14728 50.23134 > 49.0914
Re: [R] writing multiple lines to a file
HI, May be this helps: printer1<- file("out1.txt","w") write(sprintf("This is line %d.\n",1),printer1,append=TRUE) write("This is line 2",printer1,append=TRUE) close(printer1) #or printer1<- file("out1.txt","w") writeLines("This is line",con=printer1,sep="\n") writeLines("This is line 2",con=printer1) close(printer1) A.K. Hello, I am trying to wrote multiple lines to a file, but I only seem to be able to write the last line. printer = file("out.txt") write(sprintf("This is line %d.\n",1),printer,append=T) write("This is line 2.",printer,append=T) close(printer) How can I fix this? I would like to be able to do this in a for-loop with hundreds of elements. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting dataframes and cleaning extraneous characters
Hi, YOu could try. ?split() split(ats,ats$Project_NBR) You also mentioned about two columns. split(ats,list(ats$col1, ats$col2)) You should have provided an example dataset using ?dput() ( dput(head(data,10)) ) for testing. Also, gsub("^-[^-]*-","","-005-190") #[1] "190" A.K. Problem: I have a large data set and need to separate based on factors in 2 columns. The final output would be a collection of dataframes renamed to the corresponding factor levels. So far I know that for each corresponding factor I can execute x190<-ats[which(Project_NBR=='-005-190'),] However there are about 400 factors needing to be separated. Also, I would like to remove the "-005-". Any guidance will be greatly appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] writing multiple lines to a file
Hi, No problem. You could try: printer = file("out.txt","w") writeLines("This is line.",con=printer,sep=" ") writeLines("The same line.",con=printer) close(printer) #or cat(sprintf("This is line %d. ",1),file="out.txt",append=TRUE) cat("The same line.",file="out.txt",append=TRUE) A.K. Thank you very much, I just have one more simple question. It worked with writing "w" when opening the file. However another problem occoured, When I wrote \n, it went two lines down, so I had to do this, witout \n printer = file("out.txt","w") write(sprintf("This is line %d.",1),printer,append=T) write("This is line 2.",printer,append=T) close(printer) However, sometimes, I do not want to start on the new line, it depends on the situation. That is I may write something to a file. And then I want to add to the same line a new string: " The same line." Like this. printer = file("out.txt","w") write(sprintf("This is line %d.",1),printer,append=T) write(" The same line.",printer,append=T) close(printer) But the output is: This is line 1. The same line. How can I make it stop going to the new line automatically. - Original Message - From: arun To: R help Cc: Sent: Tuesday, July 16, 2013 10:53 PM Subject: Re: writing multiple lines to a file HI, May be this helps: printer1<- file("out1.txt","w") write(sprintf("This is line %d.\n",1),printer1,append=TRUE) write("This is line 2",printer1,append=TRUE) close(printer1) #or printer1<- file("out1.txt","w") writeLines("This is line",con=printer1,sep="\n") writeLines("This is line 2",con=printer1) close(printer1) A.K. Hello, I am trying to wrote multiple lines to a file, but I only seem to be able to write the last line. printer = file("out.txt") write(sprintf("This is line %d.\n",1),printer,append=T) write("This is line 2.",printer,append=T) close(printer) How can I fix this? I would like to be able to do this in a for-loop with hundreds of elements. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting dataframes and cleaning extraneous characters
HI, One problem with using ?subst() would be it depends upon the number of digits, characters etc. For eg. substring("-005-190",6) #[1] "190" substring("-0057-190",6) #[1] "-190" #whereas gsub("^-[^-]*-","","-0057-190") #[1] "190" Probably, your dataset doesn't have that sort of problem. dat1<- read.table(text=" project boro 123 m 134 k 123 m 123 m 543 q 543 q 134 k ",sep="",header=TRUE,stringsAsFactors=FALSE) res<-split(dat1,gsub("\\.","",as.character(interaction(dat1[,2],dat1[,1] res $k134 # project boro #2 134 k #7 134 k # #$m123 # project boro #1 123 m #3 123 m #4 123 m # #$q543 # project boro #5 543 q #6 543 q str(res$k134) #'data.frame': 2 obs. of 2 variables: # $ project: int 134 134 # $ boro : chr "k" "k" A.K. I was able to split the extraneous stuff using a<-substring(Project_NBR, first=6) and then cbind to add the edited column to the df. I have a sample but I am not sure how to provide it to you. I will try to produce an example that's similar to what I have: project boro 123 m 134 k 123 m 123 m 543 q 543 q 134 k Basically I am trying to subset the data frame according to project and boro with the name of the subset being boro-project (ex. m123, k134) I hope this provides more clarity to my problem. - Original Message - From: arun To: R help Cc: Sent: Wednesday, July 17, 2013 11:06 AM Subject: Re: Splitting dataframes and cleaning extraneous characters Hi, YOu could try. ?split() split(ats,ats$Project_NBR) You also mentioned about two columns. split(ats,list(ats$col1, ats$col2)) You should have provided an example dataset using ?dput() ( dput(head(data,10)) ) for testing. Also, gsub("^-[^-]*-","","-005-190") #[1] "190" A.K. Problem: I have a large data set and need to separate based on factors in 2 columns. The final output would be a collection of dataframes renamed to the corresponding factor levels. So far I know that for each corresponding factor I can execute x190<-ats[which(Project_NBR=='-005-190'),] However there are about 400 factors needing to be separated. Also, I would like to remove the "-005-". Any guidance will be greatly appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simplify a dataframe
Hi, You could try: df1[,1:2]<-lapply(df1[,1:2],as.character) df2New<- data.frame(Deb=unique(with(df1,ave(Debut,INDX,FUN=function(x) head(x,1,Fin=unique(with(df1,ave(Fin,INDX,FUN=function(x) tail(x,1) identical(df2New,df2) #[1] TRUE A.K. - Original Message - From: Arnaud Michel To: Rui Barradas ; R help ; arun Cc: Sent: Wednesday, July 17, 2013 4:03 PM Subject: Re: [R] simplify a dataframe Thank you for the question (1) Sorry for the imprecision for the question (2) : Suppose the date frame df df1 <- data.frame( Debut =c ( "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" ,"28/02/1995" ,"01/03/1995", "13/03/1995", "01/01/1996", "31/01/1996") , Fin = c ( "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", "28/02/1995", "12/03/1995", "30/06/1995", "30/01/1996", "31/01/1996") , INDX = c(6,6,6, 11,11,11, 4, 5,5) ) I would like replace df1 by df2 df2 <- data.frame( Deb = c("24/01/1995", "02/02/1995", "13/03/1995", "01/01/1996") , Fin = c("31/12/1997", "12/03/1995", "30/06/1995", "31/01/1996") ) Explication : The lines 1, 2 3 of df1 (who have same value of index =6) are replaced by only one line with value of Debut of df2 = Debut of line 1 of df1 value of Fin of df2 = Fin of line 3 of df1 The lines 4,5,6 of df1 (who have same value of index =11) are replaced by only one line with value of Debut of df2 = Debut of line 4 of df1 and value of fin of df2 = Fin of line 6 of df1 The line 7 of df1 (who have same value of index =4) are replaced by only one line with value of Debut of df2 = Debut of line 7of df1 and value of fin of df2 = Fin of line 7of df1 ==> No change The lines 8,9 of df1 (who have same value of index =5) are replaced by only one line with value of Debut of df2 = Debut of line 8of df1 and value of fin of df2 = Fin of line 9 of df1 df1 Debut Fin INDX 1 24/01/1995 30/04/1997 6 2 01/05/1997 30/12/1997 6 3 31/12/1997 31/12/1997 6 4 02/02/1995 27/02/1995 11 5 28/02/1995 28/02/1995 11 6 01/03/1995 12/03/1995 11 7 13/03/1995 30/06/1995 4 8 01/01/1996 30/01/1996 5 9 31/01/1996 31/01/1996 5 Deb Fin 1 24/01/1995 31/12/1997 2 02/02/1995 12/03/1995 3 13/03/1995 30/06/1995 4 01/01/1996 31/01/1996 Thank you for your helps Michel Le 17/07/2013 19:57, Rui Barradas a écrit : > Hello, > > As for question (1), try the following. > > > y2 <- cumsum(c(TRUE, diff(x1) > 0)) > identical(as.integer(y1), y2) # y1 is of class "numeric" > > > As for question (2) I'm not understanding it. > > Hope this helps, > > Rui Barradas > > Em 17-07-2013 18:21, Arnaud Michel escreveu: >> Hi Arun >> >> I have two questions always about the question of symplify a dataframe >> >> I would like >> 1) to transform the vector x1 into the vector y1 >> x1 <- c(1,1,1,-1000, 1,-1000, 1,1,1,1,1,1,-1000) >> y1 <- c(1,1,1,1, 2,2, 3,3,3,3,3,3,3) >> >> >> 2) to transform the vectors Debut and Fin by taking into account INDX >> into the two vectors Deb and Fin >> Debut <- c ( >> "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" ,"28/02/1995" >> ,"01/03/1995", >> "13/03/1995", "01/01/1996", "31/01/1996", "24/01/1995", "01/07/1995" >> ,"01/09/1995", >> "01/07/1997", "01/01/1998", "01/08/1998", "01/01/2000", >> "17/01/2000","29/02/2000") >> >> Fin <- c ( >> "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", "28/02/1995", >> "12/03/1995", >> "30/06/1995", "30/01/1996", "31/01/1996", "30/06/1995", "31/08/1995", >> "30/06/1997", >> "31/12/1997", "31/07/1998", "31/12/1999", "16/01/2000", "28/02/2000", >> "29/02/2000") >> >> INDX <- c(6,6,6, 11,11,11, 4, 5,5) >> >> >> Deb <- c("*24/01/1995*", "*02/02/1995*", "*13/03/1995*", >> "*01/01/1996*") >> Fi n <- c("*31/12/1997*", "*12/03/1995*", "*30/06/1995*", >> "*31/01/1996*") >> >> >> Debut Fin INDX >> *24/01/1995* 30/04/1997 6 >> 01/05/1997 30/12/1997 6 >> 31/12/1997 *31/12/1997* 6 >> *02/02/1995* 27/02/1995 11 >> 28/02/1995 28/02/1995 11 >> 01/03/1995 *12/03/1995* 11 >> *13/03/1995* *30/06/1995* 4 >> *01/01/1996* 30/01/1996 5 >> 31/01/1996 *31/01/1996* 5 >> >> >> Thanks for your help >> >> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > -- Michel ARNAUD Chargé de mission auprès du DRH DGDRD-Drh - TA 174/04 Av Agropolis 34398 Montpellier cedex 5 tel : 04.67.61.75.38 fax : 04.67.61.57.87 port: 06.47.43.55.31 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simplify a dataframe
#or library(plyr) res<-ddply(df1,.(INDX),summarize,Debut=head(Debut,1),Fin=tail(Fin,1)) res$INDX<-factor(res$INDX,levels=unique(df1$INDX)) res[order(res$INDX),-1] # Debut Fin #3 24/01/1995 31/12/1997 #4 02/02/1995 12/03/1995 #1 13/03/1995 30/06/1995 #2 01/01/1996 31/01/1996 A.K. - Original Message - From: arun To: Arnaud Michel Cc: R help ; Rui Barradas Sent: Wednesday, July 17, 2013 4:14 PM Subject: Re: [R] simplify a dataframe Hi, You could try: df1[,1:2]<-lapply(df1[,1:2],as.character) df2New<- data.frame(Deb=unique(with(df1,ave(Debut,INDX,FUN=function(x) head(x,1,Fin=unique(with(df1,ave(Fin,INDX,FUN=function(x) tail(x,1) identical(df2New,df2) #[1] TRUE A.K. - Original Message - From: Arnaud Michel To: Rui Barradas ; R help ; arun Cc: Sent: Wednesday, July 17, 2013 4:03 PM Subject: Re: [R] simplify a dataframe Thank you for the question (1) Sorry for the imprecision for the question (2) : Suppose the date frame df df1 <- data.frame( Debut =c ( "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" ,"28/02/1995" ,"01/03/1995", "13/03/1995", "01/01/1996", "31/01/1996") , Fin = c ( "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", "28/02/1995", "12/03/1995", "30/06/1995", "30/01/1996", "31/01/1996") , INDX = c(6,6,6, 11,11,11, 4, 5,5) ) I would like replace df1 by df2 df2 <- data.frame( Deb = c("24/01/1995", "02/02/1995", "13/03/1995", "01/01/1996") , Fin = c("31/12/1997", "12/03/1995", "30/06/1995", "31/01/1996") ) Explication : The lines 1, 2 3 of df1 (who have same value of index =6) are replaced by only one line with value of Debut of df2 = Debut of line 1 of df1 value of Fin of df2 = Fin of line 3 of df1 The lines 4,5,6 of df1 (who have same value of index =11) are replaced by only one line with value of Debut of df2 = Debut of line 4 of df1 and value of fin of df2 = Fin of line 6 of df1 The line 7 of df1 (who have same value of index =4) are replaced by only one line with value of Debut of df2 = Debut of line 7of df1 and value of fin of df2 = Fin of line 7of df1 ==> No change The lines 8,9 of df1 (who have same value of index =5) are replaced by only one line with value of Debut of df2 = Debut of line 8of df1 and value of fin of df2 = Fin of line 9 of df1 df1 Debut Fin INDX 1 24/01/1995 30/04/1997 6 2 01/05/1997 30/12/1997 6 3 31/12/1997 31/12/1997 6 4 02/02/1995 27/02/1995 11 5 28/02/1995 28/02/1995 11 6 01/03/1995 12/03/1995 11 7 13/03/1995 30/06/1995 4 8 01/01/1996 30/01/1996 5 9 31/01/1996 31/01/1996 5 Deb Fin 1 24/01/1995 31/12/1997 2 02/02/1995 12/03/1995 3 13/03/1995 30/06/1995 4 01/01/1996 31/01/1996 Thank you for your helps Michel Le 17/07/2013 19:57, Rui Barradas a écrit : > Hello, > > As for question (1), try the following. > > > y2 <- cumsum(c(TRUE, diff(x1) > 0)) > identical(as.integer(y1), y2) # y1 is of class "numeric" > > > As for question (2) I'm not understanding it. > > Hope this helps, > > Rui Barradas > > Em 17-07-2013 18:21, Arnaud Michel escreveu: >> Hi Arun >> >> I have two questions always about the question of symplify a dataframe >> >> I would like >> 1) to transform the vector x1 into the vector y1 >> x1 <- c(1,1,1,-1000, 1,-1000, 1,1,1,1,1,1,-1000) >> y1 <- c(1,1,1,1, 2,2, 3,3,3,3,3,3,3) >> >> >> 2) to transform the vectors Debut and Fin by taking into account INDX >> into the two vectors Deb and Fin >> Debut <- c ( >> "24/01/1995", "01/05/1997" ,"31/12/1997", "02/02/1995" ,"28/02/1995" >> ,"01/03/1995", >> "13/03/1995", "01/01/1996", "31/01/1996", "24/01/1995", "01/07/1995" >> ,"01/09/1995", >> "01/07/1997", "01/01/1998", "01/08/1998", "01/01/2000", >> "17/01/2000","29/02/2000") >> >> Fin <- c ( >> "30/04/1997", "30/12/1997" ,"31/12/1997", "27/02/1995", "28/02/1995", >> "12/03/1995", >> "30/06/1995", "30/01/1996", "31/01/1996", "30/06/1995", "31/08/1995", >> "30/06/1997", >> "31/12/1997", "31/07/1998", "31/12/1999", "16/01/2000", "28/02/2000", >> "29/02/2000") >> >> INDX <- c(6,6,6,
Re: [R] cut into groups of equal nr of elements...
HI, Not sure whether this is what you wanted. vec1<- 1:7 fun1<- function(x,nr) {((x-1)%/%nr)+1} fun1(vec1,2) #[1] 1 1 2 2 3 3 4 fun1(vec1,3) #[1] 1 1 1 2 2 2 3 split(vec1,fun1(vec1,2)) A.K. - Original Message - From: Witold E Wolski To: r-help@r-project.org Cc: Sent: Wednesday, July 17, 2013 5:43 PM Subject: [R] cut into groups of equal nr of elements... I would like to "cut" a vector into groups of equal nr of elements. looking for a function on the lines of cut but where I can specify the size of the groups instead of the nr of groups. -- Witold Eryk Wolski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut into groups of equal nr of elements...
Sorry, there was a mistake: fun1 should be: fun1<- function(x,nr) {((seq_along(x)-1)%/%nr)+1} vec3<- c(4,5,7,9,8,5) fun1(vec3,2) #[1] 1 1 2 2 3 3 split(vec3,fun1(vec3,2)) A.K. - Original Message - From: arun To: Witold E Wolski Cc: R help Sent: Wednesday, July 17, 2013 6:04 PM Subject: Re: [R] cut into groups of equal nr of elements... HI, Not sure whether this is what you wanted. vec1<- 1:7 fun1<- function(x,nr) {((x-1)%/%nr)+1} fun1(vec1,2) #[1] 1 1 2 2 3 3 4 fun1(vec1,3) #[1] 1 1 1 2 2 2 3 split(vec1,fun1(vec1,2)) A.K. - Original Message - From: Witold E Wolski To: r-help@r-project.org Cc: Sent: Wednesday, July 17, 2013 5:43 PM Subject: [R] cut into groups of equal nr of elements... I would like to "cut" a vector into groups of equal nr of elements. looking for a function on the lines of cut but where I can specify the size of the groups instead of the nr of groups. -- Witold Eryk Wolski __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combine select data from 2 dataframes sharing same variables
Hi, Not sure if this is what you wanted: #If columns are arranged in the same order in both data.frames. lst1<-lapply(seq_len(ncol(StatsUTAH)),function(i) {x1<-cbind(StatsUTAH[,i],sStatsUTAH[,i]);row.names(x1)<-row.names(StatsUTAH);colnames(x1)<-c("zeroNO","zeroYES");x1}) names(lst1)<- colnames(StatsUTAH) A.K. - Original Message - From: bcrombie To: r-help@r-project.org Cc: Sent: Wednesday, July 17, 2013 4:12 PM Subject: [R] combine select data from 2 dataframes sharing same variables # The following dataframes are the result of two analyses performed on the same set of numeric data. # The first analysis involved calculations that did not include zero values: StatsUTAH = data.frame(MWtotaleesDue = c(8.428571,2.496256,7,6.604472,1,17,3.593998,4.834573,12.02257), OTtotaleesDue = c(6.6,2.242023,3,7.089899,1,23,3.100782,3.499218,9.700782), OTtotalBWsDue = c(559.944,305.7341,257.55,966.816,15.19,3232.97,422.839,137.105,982.783), TotalBWsFD = c(693.2973,265.0846,267.58,1026.6682,15.19,3232.97,356.5468,336.7505,1049.8442)) rownames(StatsUTAH)<- c("Mean","StdError", "Median", "StdDev", "Min", "Max", "NinetyPct", "NinetyPctLower", "NinetyPctUpper") StatsUTAH # The second analysis involved calculations that included zero values: sStatsUTAH = data.frame(MWtotaleesDue = c(0.9076923,0.411799,0,3.3200295,0,17,0.5332467,0.3744456,1.440939), OTtotaleesDue = c(1.0153846,0.4442433,0,3.5816036,0,23,0.5752594,0.4401252,1.590644), OTtotalBWsDue = c(86.14523,51.5752,0,415.81256,0,3232.97,66.78575,19.35948,152.93098), TotalBWsFD = c(159.99169,69.86036,0,563.23225,0,3232.97,90.46357,69.52812,250.45526)) rownames(sStatsUTAH)<- c("sMean","sStdError", "sMedian", "sStdDev", "sMin", "sMax", "sNinetyPct", "sNinetyPctLower", "sNinetyPctUpper") sStatsUTAH #the rows 1-9 may have different names in each dataframe but are the same corresponding calculation in both. # I need to combine these data so that the OUTPUT is a SEPARATE table (or matrix or whatever) # FOR EACH VARIABLE SHARED BY THE DATAFRAMES that I can place in a word document (which I can handle later with RTF). # This is how I've mapped it out in my head, but need to convert to R language: # StatsUTAH ---data for "zeroNO" # sStatsUTAH ---data for "zeroYES" # # Table 1: MWtotaleesDue # colnames("zeroNO", "zeroYES") # rownames("Mean","StdError", "Median", "StdDev", "Min", "Max", "NinetyPct", "NinetyPctLower", "NinetyPctUpper") # # Table 2: OTtotaleesDue # same colnames & rownames as Table 1 # # Table 3: OTtotalBWsDue # same colnames & rownames as Table 1 # # Table 4: TotalBWsFD # same colnames & rownames as Table 1 #WHAT IS THE BEST WAY TO DO THIS IN R? #While a loop may be more efficient, is there also a good way to create each table separately? #Note: my real dataframes (StatsUTAH,etc) will have a lot more variables than what are listed in this example #so I will probably be picking and choosing which ones I'm interested in creating tables for. -- View this message in context: http://r.789695.n4.nabble.com/combine-select-data-from-2-dataframes-sharing-same-variables-tp4671790.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge with transposed matrix.
Hi, m1<- matrix(NA,5,5) m1[upper.tri(m1)]<-c(2,3,8,4,9,14,5,10,15,20) One way would be: m1[lower.tri(m1)]<-t(m1)[lower.tri(t(m1))] m1 # [,1] [,2] [,3] [,4] [,5] #[1,] NA 2 3 4 5 #[2,] 2 NA 8 9 10 #[3,] 3 8 NA 14 15 #[4,] 4 9 14 NA 20 #[5,] 5 10 15 20 NA A.K. Hello ! I would like to have some simple sintax in order to fill my matrix with its transposed. That is, as an example I have a correlation matrix like this, and the transposed one: > matrix [,1] [,2] [,3] [,4] [,5] [1,] NA 2 3 4 5 [2,] NA NA 8 9 10 [3,] NA NA NA 14 15 [4,] NA NA NA NA 20 [5,] NA NA NA NA NA > transposed.matrix<-t(matrix) [,1] [,2] [,3] [,4] [,5] [1,] NA NA NA NA NA [2,] 2 NA NA NA NA [3,] 3 8 NA NA NA [4,] 4 9 14 NA NA [5,] 5 10 15 20 NA And I would like to have [,1] [,2] [,3] [,4] [,5] [1,] NA 2 3 4 5 [2,] 2 NA 8 9 10 [3,] 3 8 NA 14 15 [4,] 4 9 14 NA 20 [5,] 5 10 15 20 NA Thank you very much for your help !! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.