[R] Selecting Variables
Hi All, i have a dataset that i want to dynamically inspect for the number of variables that start with "Exposure_" and then for these count the entries across each case i.e ID Exposure_1 Exposure_2 Exposure_3 1y yy 2y y- 3y - - So the corresponding new variables that would be created are ID Max_Exposure Unique_Exposure 1 3 3 2 3 2 3 3 1 I know this may seem fairly basic but it will give me the starting point to develop more advanced things with loop and nat lang Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting Variables
Thanks for the help guys, i think i needed to be a bit more explicit however (sorry) There are lots of variables between each exposure and the values are nominal with upto 6 values.. And to add to the problem the datasets i deal with range from anything upto 5G. My guess is that the melt function would be inefficient in this situation. I was looking at the agrep function to count the number Exposures in the names() , i wasn't sure of how to count if there was a value in each one but the y[complete.cases(y),] looks like a nice function. Is this a good path to follow? On Tue, Aug 5, 2008 at 3:09 PM, jim holtman <[EMAIL PROTECTED]> wrote: > I am not sure where the "Max" comes from, but this might be a start for > you: > > > x <- read.table(textConnection("ID Exposure_1 Exposure_2 Exposure_3 > + 1y yy > + 2y y- > + 3y - -"), header=TRUE, > na.strings='-') > > closeAllConnections() > > require(reshape) > > y <- melt(x, id.var='ID') > > # get rid of NAs > > y <- y[complete.cases(y),] > > y > ID variable value > 1 1 Exposure_1 y > 2 2 Exposure_1 y > 3 3 Exposure_1 y > 4 1 Exposure_2 y > 5 2 Exposure_2 y > 7 1 Exposure_3 y > > cbind(Unique=tapply(y$ID, y$ID, length)) > Unique > 1 3 > 2 2 > 3 1 > > > > > On Tue, Aug 5, 2008 at 9:21 AM, Michael Pearmain <[EMAIL PROTECTED]> > wrote: > > Hi All, > > > > i have a dataset that i want to dynamically inspect for the number of > > variables that start with "Exposure_" and then for these count the > entries > > across each case i.e > > > > ID Exposure_1 Exposure_2 Exposure_3 > > 1y yy > > 2y y- > > 3y - - > > > > So the corresponding new variables that would be created are > > > > ID Max_Exposure Unique_Exposure > > 1 3 3 > > 2 3 2 > > 3 3 1 > > > > I know this may seem fairly basic but it will give me the starting point > to > > develop more advanced things with loop and nat lang > > > > Thanks in advance > > > > Mike > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > -- Michael Pearmain Senior Statistical Analyst 1st Floor, 180 Great Portland St. London W1W 5QZ t +44 (0) 2032191684 [EMAIL PROTECTED] [EMAIL PROTECTED] Doubleclick is a part of the Google group of companies "If you received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks." [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Flag variable
Hi All, I have 4000 case which have string variables in them, i want to do some fuzzy matching and create a new variable that is of the same length with 0 or 1's if i use the code test<- agrep("web Klick",ETC$Exposure.Type , max = 2, ignore.case = TRUE) it works but i get > length(test) [1] 3127 This returns the case values that do match, can someone tell me how to match this on the dataset (ETC) that i have as 1 and 0 ? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with nls and error messages singular gradient
Hi All, I'm trying to run nls on the data from the study by Marske (Biochemical Oxygen Demand Interpretation Using Sum of Squares Surface. M.S. thesis, University of Wisconsin, Madison, 1967) and was reported in Bates and Watts (1988). Data is as follows, (stored as mydata) time bod 11 0.47 22 0.74 33 1.17 44 1.42 55 1.60 67 1.84 79 2.19 8 11 2.17 I then run the following; #Plot initial curve plot(mydata$time, mydata$bod,xlab="Time (in days)",ylab="biochemical oxygen demand (mg/l) ") model <- nls(bod ~ beta1/(1 - exp(beta2*time)), data = mydata, start=list(beta1 = 3, beta2 = -0.1),trace=T) The start values are recommended, (I have used these values in SPSS without any problems, SPSS returns values of Beta1 = 2.4979 and Beta2 = -2.02 456) but return the error message, Error in nls(bod ~ beta1/(1 - exp(beta2 * time)), data = mydata, start = list(beta1 = 3, : singular gradient Can anyone offer any advice? Thanks in advance Mike -- Michael Pearmain Senior Analytics Research Specialist Statistics are like women; mirrors of purest virtue and truth, or like whores to use as one pleases Google UK Ltd Belgrave House 76 Buckingham Palace Road London SW1W 9TQ United Kingdom t +44 (0) 2032191684 mpearm...@google.com If you received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Viewing Function Code
Hi All, I'd like to see the function code behind the barplots2() function in the gplots package, however i come across a bit of a stumbling block of a hidden function, can anyone help? > library(gplots) > methods(barplot2) [1] barplot2.default* Non-visible functions are asterisked > barplot2 function (height, ...) UseMethod("barplot2") Mike -- Michael Pearmain Senior Analytics Research Specialist "I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live. ~Louis D. Brandeis" Google UK Ltd Belgrave House 76 Buckingham Palace Road London SW1W 9TQ United Kingdom t +44 (0) 2032191684 mpearm...@google.com If you received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] List mappings and variable creation
Hi All, I have a questions about associative list mappings in R, and if they are possible? I have data in the form show below, and want to make a new 'bucket' variable called combined. Which is the sum of the control and the exposed metric values This combined variable is a many to many matching as values only appear in the file if they have a value > 0. conversion.type filteredIDbucketID Metric Value countertrue control a 1 countertrue control b 1 countertrue control c 2 countertrue control d 3 countertrue exposed a 4 countertrue exposed e 1 ASIDE: At the minute i read the data into my file and and then create all the 'missing' row values (in this case, countertrue control e 0 countertrue exposed b 0 countertrue exposed c 0 countertrue exposed d 0) and then run a sort on the data, and count the number of times control appears, and then use this as an index matcher. saw.aggr.data <- [order(saw.aggr.data$bucketID, saw.aggr.data$metric), ] no.of.metrics <- length(saw.aggr.data$bucketID[grep("control", saw.aggr.data$bucketID)]) for (i in (1:no.of.metrics)) { assign(paste("combined", as.character(saw.aggr.data$metric[i])), (saw.aggr.data$value[i] + saw.aggr.data$value[i + no.of.metrics])) } This does what i want it to but is very very weak and could be open to large errors, ( error handling currently via grepping the names of the metric[i] == name of metric [i + no.of.metrics]) Is there a more powerful way of doing this using some kind of list mapping? I've looked at the older threads in this area and it looks like something that should be possible but i can't figure out how to do this? Ideally i'd like a final dataset / list that is of the following form. conversion.type filteredIDbucketID Metric Value countertrue control a 1 countertrue control b 1 countertrue control c 2 countertrue control d 3 countertrue exposed a 4 countertrue exposed e 1 countertrue combineda 5 countertrue combinedb 1 countertrue combinedc 2 countertrue combinedd 3 countertrue combinede 1 So i dont have to create the dummy variables. does this make sense? Many thanks in advance Mike -- Michael Pearmain "I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live. ~Louis D. Brandeis" f you received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Numeric formatting question
Hi All, I have am using Sweave and the \Sexpr{} to place some numeric variables in my tex document. I want to format the number prior to entry so they read slightly more elegantly. Say i have the following numbers x <- 0.00487324 y <- 0.00432 z <- 0.567 I would like to have the numbers displayed as follows x1 <- 0.0049 y1 <- 0.0043 z1 <- 0.57 I've seen i can use sprintf("%.3f", pi) for example to get the formating after the decimal place, but i can't figure out an elegant way to find the position of the first non-zero entry to allow me to substitute this value into the sprintf command. Can anyone offer any advise? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Title splitting function
Hi All, I'm trying to write a function to automatically split long strings so they will appear nicely in a chart i'm trying to create, Say i have a string title <- "some variety of words that are descriptive" In this instance i want to place carriage return where there is a space just prior to a specified number of characters (in this case 15) title.length <- nchar(title) no.splits <- floor(title.length / 15) space.title <- c(gregexpr("[[:space:]]", title)[[1]]) space.title # This tells me the position of all spaces in the title [1] 5 13 16 22 27 31 > no.splits # This tells me how many carriage returns i will need [1] 2 > title.length # this tells me teh total length of the title string [1] 42 I can then check to see where the last value is for each string i.e. where i should make the break with (no.splits * characters (i.e 15) 15 < space.title ##(15 * 1 split) [1] FALSE FALSE TRUE TRUE TRUE TRUE > 30 < space.title ## (15 *2 splits) [1] FALSE FALSE FALSE FALSE FALSE TRUE > (I'm guessing i need to create some loop or apply here) So i know i need to do a sub at positions 13 and 27 of "" to "\n" So my final output would appear as title <- "some variety\nof words that are\ndescriptive" But i'm getting stuck as to find a way to work out the the positions 13, 27 dynamically and returning them to use later Can anyone offer any advise? Thanks All. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing function arguments
Hi All, Im looking for some help passing function arguments and referencing them, I've made a replica, less complicated function to show my problem, and how i've made a work around for this. However i suspect there is a _FAR_ better way of doing this. If i do: BuildDecayModel <- function(x = "this", y = "that", data = model.data) { model <- nls(y ~ SSexp(x, y0, b), data = model.data) return(model) } ... "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases" This function returns an error because the args are passed as "this" and "that" to the model, and so fails (correct?) If i do the following: BuildDecayModel <- function(x = "total.reach", y = "lift", data = model.data) { x <- data[[x]] y <- data[[y]] model.data <- as.data.frame(cbind(x,y)) model <- nls(y ~ SSexp(x, y0, b), data = model.data) return(model) } This works for me, but it seems that i'm missing a trick with just manipulating the args rather than making an entire new data.frame to work off, Can anyone offer some advice? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing Arguments in a function
Hi All, I'm having some trouble assigning arguments inside a function to produce a plot from a model Can anyone help me? Below I've outlined the situation and examples of failing and working code. Regards Mike ## data ## decay.data <- ... behaviors lift reach.uu estimated.conversions.uu total.reach 1 1 432.0083 770 7700.00 2 2 432.008329660296600.03 3 3 429.98643298500.03 4 4 427.859930320300200.03 5 5 424.010530680301100.03 6 6 418.471031180302000.03 7 7 412.002231840303600.03 8 8 405.455332340303500.03 9 9 397.008333260305600.03 1010 393.238233600305800.03 1111 385.543134940311800.03 1212 384.294235050311700.03 1313 374.029936110312600.03 1414 363.305737290313500.03 1515 353.118538450314200.03 1616 342.21904316800.03 1717 338.923240470317400.04 1818 328.866641880318800.04 1919 318.321945510335300.04 2020 308.120047230336800.04 If i use: library(nlrwr) # Build a model. decay.model <- nls(lift ~ SSexp(total.reach, y0, b), data = decay.data) # plot the model plot(decay.data[["total.reach"]], decay.data[["lift"]]) xv <- seq(min(decay.data[["lift"]]), max(decay.data[["total.reach"]]), 0.02) yv <- predict(decay.model, newdata = list(total.reach = xv)) lines(xv,yv) This works. If i try and wrap this in a function and pass the argument values i fail when i reach the "list(total.reach = xv)" i've tried various flavours or paste(), but again can't figure out where i am going wrong, any help appreciated. PlotDecayModel <- function(x = "total.reach", y = "lift", data) { decay.model <- BuildDecayModel(x= "total.reach", y = "lift", data = data) # Plot the lift Vs reach plot. plot(data[[x]], data[[y]]) # Add the model curve to the plot. xv <- seq(min(data[[x]]), max(data[[x]]), 0.02) yv <- predict(decay.model, newdata = list(x = xv)) lines(xv,yv) } I return the error Error in xy.coords(x, y) : 'x' and 'y' lengths differ I can see this is because the function ignores the 'newdata = list(x = xv)' as it is trying ot assign x on the data to be xv, (which doesn't exist in the model), so how can i use the arg "total.reach" so the argument 'newdata = list(x = xv)' is evaluated as 'newdata = list(total.reach = xv)' Many thanks in advance Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R code for OptiGrid Clustering
Hi All, Has anyone coded up the OptiGrid clustering algorithm for high dimensional space? If so is anyone willing to share? Many thanks in Advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loading List data into R with scan()
Hi All, I've been given a data file of the form: 1: 3,4,5,6 2:1,2,3 43: 5,7,8,9,5 and i want to read this data in as a list to create the form: (guessing final look) my.list [[1]] [1] 3 4 5 6 [[2]] [1] 1 2 3 [[43]] [1] 5 7 8 9 5 I can get to a stage using scan: scan("my.data", what = character(0), quiet = TRUE) to load [1] "1: 3,4,5,6" [2] "2:1,2,3" [3] "43: 5,7,8,9,5" but im not sure on how next to proceed to arrange this into a list form, can anyone offer some advise? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading List data into R with scan()
Thanks Uwe, The list elements was a mistake on my part, i just wanted everything before the : to be the name of the element. Thanks for the help, i can play around with this to get what i want. M 2011/6/23 Uwe Ligges > > > On 23.06.2011 16:39, Michael Pearmain wrote: > >> Hi All, >> >> I've been given a data file of the form: >> 1: 3,4,5,6 >> 2:1,2,3 >> 43: 5,7,8,9,5 >> >> and i want to read this data in as a list to create the form: >> (guessing final look) >> my.list >> [[1]] >> [1] 3 4 5 6 >> >> [[2]] >> [1] 1 2 3 >> >> [[43]] >> [1] 5 7 8 9 5 >> >> I can get to a stage using scan: >> scan("my.data", what = character(0), quiet = TRUE) >> to load >> [1] "1: 3,4,5,6" >> [2] "2:1,2,3" >> [3] "43: 5,7,8,9,5" >> > > > I don't understand why you want 40 empty list elements, but here is what > you asked for (not optimized, just hacked in few seconds): > > temp <- strsplit(d, ":") > num <- as.numeric(sapply(temp, "[[", 1)) > L <- vector(mode = "list", length = max(num)) > for(i in seq_along(temp)){ >L[[num[i]]] <- as.numeric(unlist(strsplit(**temp[[i]][2], ","))) > } > L > > Uwe Ligges > > > > but im not sure on how next to proceed to arrange this into a list form, >> can >> anyone offer some advise? >> >> Thanks in advance >> >> Mike >> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loading List data into R with scan()
Thanks All, Henrique, gave me the solution is was looking for, the indexing was a mistake on my part. Thanks again On 23 June 2011 16:37, David Winsemius wrote: > > On Jun 23, 2011, at 11:19 AM, Uwe Ligges wrote: > > >> >> On 23.06.2011 16:39, Michael Pearmain wrote: >> >>> Hi All, >>> >>> I've been given a data file of the form: >>> 1: 3,4,5,6 >>> 2:1,2,3 >>> 43: 5,7,8,9,5 >>> >>> and i want to read this data in as a list to create the form: >>> (guessing final look) >>> my.list >>> [[1]] >>> [1] 3 4 5 6 >>> >>> [[2]] >>> [1] 1 2 3 >>> >>> [[43]] >>> [1] 5 7 8 9 5 >>> >>> I can get to a stage using scan: >>> scan("my.data", what = character(0), quiet = TRUE) >>> to load >>> [1] "1: 3,4,5,6" >>> [2] "2:1,2,3" >>> [3] "43: 5,7,8,9,5" >>> >> >> >> I don't understand why you want 40 empty list elements, but here is what >> you asked for (not optimized, just hacked in few seconds): >> >> temp <- strsplit(d, ":") >> num <- as.numeric(sapply(temp, "[[", 1)) >> L <- vector(mode = "list", length = max(num)) >> for(i in seq_along(temp)){ >> L[[num[i]]] <- as.numeric(unlist(strsplit(**temp[[i]][2], ","))) >> } >> L >> > > I wondered about that too. Perhaps he would be satisfied with alpha > indexing: > > d <- c( "1: 3,4,5,6", "2:1,2,3", "43: 5,7,8,9,5") > temp <- strsplit(d, ":") > num <- sapply(temp, "[[", 1) > L <- vector(mode = "list") > > for(i in seq_along(temp)){ >L[[num[i]]] <- as.numeric(unlist(strsplit(**temp[[i]][2], ","))) > } > > > L > $`1` > > [1] 3 4 5 6 > > $`2` > [1] 1 2 3 > > $`43` > [1] 5 7 8 9 5 > > > Uwe Ligges >> >> >> >> but im not sure on how next to proceed to arrange this into a list form, >>> can >>> anyone offer some advise? >>> >>> Thanks in advance >>> >>> Mike >>> >> > > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using Match in a lookup table
Hi All, I'm having a few problems using match and a lookup table, previous Googling show numerous solutions to matching a lookup table to a dataset, My situation is slightly different as i have multiple lookup tables, (that i cannot merge - for integrity reasons) that i wish to match against my data, and each of these files is large, so lots of for / if conditions are not ideal. (withstanding that i have multiple files of course) For example, I have data: > v <- c("foo", "foo", "bar", "bar", "baz") > id <- c(1,2) > id2 <- c(3) > name <- c("foo", "bar") > name2 <- c("baz") > df1 <- data.frame(id, name) > df2 <- data.frame(id2, name2) > v <- df1$id[match(v,df1$name)] > v [1] 1 1 2 2 NA So here i actually want to return > v [1] 1 1 2 2 "baz" so next time i can run v <- df2$id[match(v,df2$name)] and return: > v [1] 1 1 2 2 3 Any help very much appreciated Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using Match in a lookup table
Thanks for the idea David, My problem comes from having (say) upto 10 different match files, so nested ifelse whilst would work doesn't seem and elegant solution, However if needs must.. Mike On 28 June 2011 14:39, David Winsemius wrote: > > On Jun 28, 2011, at 6:18 AM, Michael Pearmain wrote: > > Hi All, >> >> I'm having a few problems using match and a lookup table, previous >> Googling >> show numerous solutions to matching a lookup table to a dataset, >> My situation is slightly different as i have multiple lookup tables, (that >> i >> cannot merge - for integrity reasons) that i wish to match against my >> data, >> and each of these files is large, so lots of for / if conditions are not >> ideal. (withstanding that i have multiple files of course) >> >> >> For example, >> I have data: >> >> v <- c("foo", "foo", "bar", "bar", "baz") >>> id <- c(1,2) >>> id2 <- c(3) >>> name <- c("foo", "bar") >>> name2 <- c("baz") >>> df1 <- data.frame(id, name) >>> df2 <- data.frame(id2, name2) >>> >> >> v <- df1$id[match(v,df1$name)] >>> v >>> >> [1] 1 1 2 2 NA >> > > A numeric vector. > > >> So here i actually want to return >> >>> v >>> >> [1] 1 1 2 2 "baz" >> > > Not possibly a numeric vector. > > > >> so next time i can run >> v <- df2$id[match(v,df2$name)] >> >> and return: >> >>> v >>> >> [1] 1 1 2 2 3 >> > > You need to keep track of the successful matches in df1 and then ypu > probably want to interleave them with matches in df2. Perhaps instead use > ifelse to do the work for you: > > > ifelse(!is.na(match(v,df1$**name)), match(v,df1$name), > match(v,df2$name2) ) > > [1] 1 1 2 2 1 > > > >> Any help very much appreciated >> >> Mike >> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert ragged list to structured matrix efficiently
Hi All, I'm wanting to convert a ragged list of values into a structured matrix for further analysis later on, i have a solution to this problem (below) but i'm dealing with datasets upto 1GB in size, (i have 24GB of memory so can load it) but it takes a LONG time to run the code on a large dataset. I was wondering if anyone had any tips or tricks that may make this run faster? Below is some sample code of what ive been doing, (in the full version i use snowfall to spread the work via sfSapply) bhvs <- c(1,2,3,4,5,6) ragged.list <- list('23' = c(13,4,5,6,3,65,67,2), '34' = c(1,2,3,4,56,7,8), '45' = c(5,6,89,87,56)) # Define the matrix to store results cluster.data <- as.data.frame(matrix(0, length(bhvs), nrow = length(ragged.list))) # Keep the names of the bhvs, names(cluster.data) <- bhvs cluster.data <- t(sapply(rep(1:length(ragged.list)), function (i) { cluster.data[i,] <- as.numeric(names(cluster.data) %in% (ragged.list[[i]][])) return(cluster.data[i,]) })) cluster.data <- matrix(unlist(cluster.data), ncol = ncol(cluster.data), dimnames = list(NULL, colnames(cluster.data))) > cluster.data 1 2 3 4 5 6 [1,] 0 1 1 1 1 1 [2,] 1 1 1 1 0 0 [3,] 0 0 0 0 1 1 > The returned matrix is as i desire it, with the bhv being the colnames and a binary for each row representing if it was present or not in that list Many thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] missing value where TRUE/FALSE needed
Merry Xmas to all, I am writing a function and curiously this runs sometimes on one data set and fails on another and i cannot figure out why. Any help much appreciated. If i run the code below with data <- iris[ ,1:4] The code runs fine, but if i run on a large dataset i get the following error (showing data structures as matrix is large) > str(cluster.data) num [1:9985, 1:811] 0 0 0 0 0 0 0 0 0 0 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:811] "1073949105" "1073930585" "1073843224" "1073792624" ... #(This is intended to be chr) > str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... > str(as.matrix(iris[,1:4])) num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" n.cols <- ncol(data) n.rows <- nrow(data) X <- as.matrix(data) stepsize <- 0.05 c1 <- (2 * pi) ** (n.cols / 2) c2 <- n.rows * (smoothing ** (n.cols + 2)) c3 <- n.rows * (smoothing ** n.cols) Kexp <- function(sqs){ return (exp((-1 * sqs) / (2 * smoothing ** 2))) } FindGradient <- function(x){ XmY <- t(x - t(X)) sqsum <- rowSums(XmY * XmY) K <- sapply(sqsum, Kexp) dens <- ((c1 * c3) ** -1) * sum(K) grad <- -1 * ((c1 * c2) ** -1) * colSums(K * XmY) return (list(gradient = grad, density = dens)) } attractors <- matrix(0, n.rows, n.cols) densities <- matrix(0, n.rows) > density.attractors <- sapply(rep(1:n.rows), function(i) { notconverged <- TRUE # For each row loop through and find the attractor and density value. x <- (X[i, ]) iters <- as.integer(1) # Run gradient ascent for each point to obtain x* while(notconverged == TRUE) { find.gradient <- FindGradient(x) next.x <- x + stepsize * find.gradient$gradient change <- sqrt(sum((next.x - x) * (next.x - x))) notconverged <- ifelse(change > tol, TRUE, FALSE) x <- next.x iters <- iters + 1 } # store the attractor and density value return(c(densities[i, ] <- find.gradient$density, attractors[i, ] <- x)) }) Error in while (notconverged == TRUE) { : missing value where TRUE/FALSE needed > Any help would be great Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] missing value where TRUE/FALSE needed
Apologies, I was using top = 0.0001 I had looked at browser and did show notconverged = NA. But I couldn't understand why it worked for one and not the other? On Friday, 23 December 2011, jim holtman wrote: > Does this look similar to the error you are getting: > >> while(NA == TRUE) 1 > Error in while (NA == TRUE) 1 : missing value where TRUE/FALSE needed > > SO 'notconverged' is probably equal to NA. BTW, what is the value of > 'tol'; I do not see it defined. So when computing 'notconverged' you > have generated an NA. You can test it to see if this is true. > > You can use the following command: > > options(error=utils::recover) > > and then learn how to use the 'browser' to examine variables when the > error occurs. > > On Fri, Dec 23, 2011 at 5:44 AM, Michael Pearmain > wrote: >> Merry Xmas to all, >> >> I am writing a function and curiously this runs sometimes on one data set >> and fails on another and i cannot figure out why. >> Any help much appreciated. >> >> If i run the code below with >> data <- iris[ ,1:4] >> The code runs fine, but if i run on a large dataset i get the following >> error (showing data structures as matrix is large) >> >>> str(cluster.data) >> num [1:9985, 1:811] 0 0 0 0 0 0 0 0 0 0 ... >> - attr(*, "dimnames")=List of 2 >> ..$ : NULL >> ..$ : chr [1:811] "1073949105" "1073930585" "1073843224" "1073792624" ... >> #(This is intended to be chr) >>> str(iris) >> 'data.frame': 150 obs. of 5 variables: >> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... >> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... >> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... >> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... >> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 >> 1 1 1 ... >>> str(as.matrix(iris[,1:4])) >> num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... >> - attr(*, "dimnames")=List of 2 >> ..$ : NULL >> ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" >> >> n.cols <- ncol(data) >> n.rows <- nrow(data) >> X <- as.matrix(data) >> stepsize <- 0.05 >> c1 <- (2 * pi) ** (n.cols / 2) >> c2 <- n.rows * (smoothing ** (n.cols + 2)) >> c3 <- n.rows * (smoothing ** n.cols) >> >> Kexp <- function(sqs){ >>return (exp((-1 * sqs) / (2 * smoothing ** 2))) >> } >> >> FindGradient <- function(x){ >>XmY <- t(x - t(X)) >>sqsum <- rowSums(XmY * XmY) >>K <- sapply(sqsum, Kexp) >>dens <- ((c1 * c3) ** -1) * sum(K) >>grad <- -1 * ((c1 * c2) ** -1) * colSums(K * XmY) >>return (list(gradient = grad, >> density = dens)) >> } >> >> attractors <- matrix(0, n.rows, n.cols) >> densities <- matrix(0, n.rows) >> >> >>> density.attractors <- >>sapply(rep(1:n.rows), function(i) { >> notconverged <- TRUE >> # For each row loop through and find the attractor and density value. >> x <- (X[i, ]) >> iters <- as.integer(1) >> # Run gradient ascent for each point to obtain x* >> while(notconverged == TRUE) { >>find.gradient <- FindGradient(x) >>next.x <- x + stepsize * find.gradient$gradient >>change <- sqrt(sum((next.x - x) * (next.x - x))) >>notconverged <- ifelse(change > tol, TRUE, FALSE) >>x <- next.x >>iters <- iters + 1 >> } >> >> # store the attractor and density value >> return(c(densities[i, ] <- find.gradient$density, >> attractors[i, ] <- x)) >>}) >> >> Error in while (notconverged == TRUE) { : >> missing value where TRUE/FALSE needed >>> >> >> Any help would be great >> >> Mike >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assign and cmpfun
Hi All, I've just recently discovered the cmpfun function, and was wanting to to create a function to assign this to all the functions i have created, but without explicitly naming them. I've achieved this with: foo <- function(x) { print(x)} bar <- function(x) { print(x + 1)} > foo <- function(x) { print(x)} > foo function(x) { print(x)} > cmpfun(foo) function(x) { print(x)} > find.all.functions <- ls.str(mode = 'function') for(i in seq_along(find.all.functions)) { assign(find.all.functions[i], cmpfun(get(find.all.functions[i]))) } But remember told that using assign is generally a bad idea, and ideally i want to functionalize this to say something like: CreateCompiledFunctions <- function() { find.all.functions <- ls.str(mode = 'function') for(i in seq_along(find.all.functions)) { assign(find.all.functions[i], cmpfun(get(find.all.functions[i]))) } } Does anyone have a better solution? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merging two dataframes
Hi All, Newbie question for you all but i have been looking at the archieves and the help dtuff to get a rough idea of what i want to do I would like to merge two dataframes together based on a keyed variable in one dataframe linking to the other dataframe. Only some of the cases will match but i would like to keep the others as well. My dataframes have 67 and 28 cases respectively and i would like ot end uip with one file 67 cases long (all 28 are matched cases). I can use the merge command to merge two datasets together this but i still get some odd results, i'm using the code below; ETC <- read.csv(file="CSV_Data2.csv",head=TRUE,sep=",") 'SURVEY <- read.csv(file="survey.csv",head=TRUE,sep=",") 'FullData <- merge(ETC, SURVEY, by.SURVEY = "uid", by.ETC = "ord") The merged file seems to have 1800 cases while the ETC data file only has 67 and the SURVEY file only has 28. (Reading the help it looks as if it merges 1 case with all cases in the other file, which is not what i want) The matching variables fields are the 'ord' field and the 'uid' field Can anyone advise please? -- Michael Pearmain [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Missing Data and applying
Hi All, Newbie question that i'm sure is easy, but i can't seem to apply properly I read in a datafram from a CSV file and i want to tell R that from coloum "n_0" to "n_32" the value "-1" is missing data i was looking at the is.na(xx) <- c(..,...,) idea but i can't seem to apply it properly, can anyone offer advice? On a side issue while i'm asking i have a an XML that i intend to use to add value labels and variable labels to the dataframe (using a python script) but i can't seem to find the syntax for adding value labels? i.e 1=Male 2=Female the labels command doesn't look like the one i want to use, and i've searched the archives but to no avail (maybe it's a too simple, but i have looked) Any help willing accepted [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calling functions
Another newbie question. I've written a function and saved the file as Xtabs.R, in a central place on a network so others will be able ot use the function, My question is how do i call this function? I've tried to chance the working directory, and tried to load it via; > library(Xtabs, lib.loc="//filer/common/technical/surveys/R_test") but neither seem to work? the function inside is called CrossTable. Mant thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Z test and proportions
Hi All, I have a table based on ordial data and i want to compare proportions and i've seen in the pwr package i can use power.prop.test however i want to find out what the sig. value is based on n1,n2,p1,p2 and this package doesn't contain this.. Does anyone know of a package that does or is it a case of writting a function specifically for this? Many thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Z test and proportions
Yes my mistake, I looked at the pwr.2p2n.test but i cannot place both n's and both p values to determine the sig value e,g *pwr.2p2n.test(h = , n1 = , n2 = , sig.level = , power = ) or am i missing someting obvious? i did the sam ein SPSS using a macro and the following code: COMPUTE n1 = Control_MAX . COMPUTE n2 = Exposed_max. COMPUTE x1 = Control. COMPUTE x2 = Exposed. COMPUTE p1 = x1/n1. COMPUTE p2 = x2/n2. COMPUTE phat = (x1 + x2) / (n1 + n2). COMPUTE SE_phat = SQRT(phat * (1 - phat) * ((1/n1) + (1/n2))). COMPUTE z = (p1 - p2) /SE_phat. COMPUTE SIGz_2TL = 2 * (1 - CDFNORM(ABS(z))). COMPUTE SIGz_LTL = CDFNORM(Z). COMPUTE SIGz_UTL = 1 - CDFNORM(Z). COMPUTE SIG_Level = ABS(1-(1-CDFNORM(z))*2). Compute p1p = p1*100. Compute p2p = p2*100. compute diff = p2p-p1p. EXE. Var lab p1p "Control Group %". Var lab p2p "Exposed Group %". * On Tue, Jun 17, 2008 at 5:13 PM, Peter Dalgaard <[EMAIL PROTECTED]> wrote: > Michael Pearmain wrote: > > Hi All, > > > > I have a table based on ordial data and i want to compare proportions and > > i've seen in the pwr package i can use > > power.prop.test > > > > however i want to find out what the sig. value is based on n1,n2,p1,p2 > and > > this package doesn't contain this.. > > Does anyone know of a package that does or is it a case of writting a > > function specifically for this? > > > > > I think your wired got crossed somewhere: > > power.prop.test is not from the pwr package; however, pwr does contain > pwr.2p2n.test, which looks like it does exactly what you want! > > > > Many thanks in advance > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 > > > -- Michael Pearmain Senior Statistical Analyst 1st Floor, 180 Great Portland St. London W1W 5QZ t +44 (0) 2032191684 [EMAIL PROTECTED] [EMAIL PROTECTED] Doubleclick is a part of the Google group of companies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with basic loop
I'm having trouble creating a looping variable and i can't see wher ethe problem arises from any hep gratfully appreciated First create a table x<-table(SURVEY$n_0,exposed) > x exposed False True Under 16241 16-19 689 20-24 190 37 25-34 555 204 35-44 330 87 45-54 198 65 55-64 67 35 65+ 108 Now ectors to store counts and column proportions > xT<-x[,"True"] > xF<-x[,"False"] > yT<-x[,"True"]/colSums(x) > yF<-x[,"False"]/colSums(x) check length for dynamic looping > length(yT) [1] 8 now create loop > for(i in 1:length(yT)){ + pwr.2p2n.test(2*(asin(sqrt(yT[i]))-asin(sqrt(yF[i]))),n1=xT[i],n2=xF[i]) + } Error in pwr.2p2n.test(2 * (asin(sqrt(yT[i])) - asin(sqrt(yF[i]))), n1 = xT[i], : number of observations in the first group must be at least 2 this confuses me as if i enter the data as values the procedure works? Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with basic loop
Thanks for the reply Peter, > I did just see that i had put the first error message,(agreed rather an > obvious error) in and not the second i received > > Warning message: > In asin(sqrt(yF[i])) : NaNs produced > > The reason i'm looking at this is advert exposure True and False. > > I'm inspecting age to asses weather or not to weight data in order to > normalise groups for later questions, > The questions that i am looking at later on are not scale based questions > so i cannot perform t-tests on these, so i thought the only viable way was > to look at z-tests for proportions to check for post-hoc differences > > Any advise on other methods would be gratefully taken > > > > On Fri, Jun 20, 2008 at 11:14 AM, Peter Dalgaard <[EMAIL PROTECTED]> > wrote: > >> Michael Pearmain wrote: >> > I'm having trouble creating a looping variable and i can't see wher ethe >> > problem arises from any hep gratfully appreciated >> > >> > First create a table >> > >> > x<-table(SURVEY$n_0,exposed) >> > >> >> x >> >> >> > exposed >> >False True >> > Under 16241 >> > 16-19 689 >> > 20-24 190 37 >> > 25-34 555 204 >> > 35-44 330 87 >> > 45-54 198 65 >> > 55-64 67 35 >> > 65+ 108 >> > >> > Now ectors to store counts and column proportions >> > >> > >> >> xT<-x[,"True"] >> >> xF<-x[,"False"] >> >> yT<-x[,"True"]/colSums(x) >> >> yF<-x[,"False"]/colSums(x) >> >> >> > >> > check length for dynamic looping >> > >> >> length(yT) >> >> >> > [1] 8 >> > >> > now create loop >> > >> >> for(i in 1:length(yT)){ >> >> >> > + >> pwr.2p2n.test(2*(asin(sqrt(yT[i]))-asin(sqrt(yF[i]))),n1=xT[i],n2=xF[i]) >> > + } >> > Error in pwr.2p2n.test(2 * (asin(sqrt(yT[i])) - asin(sqrt(yF[i]))), n1 = >> > xT[i], : >> > number of observations in the first group must be at least 2 >> > >> > this confuses me as if i enter the data as values the procedure works? >> > >> > Thanks in advance >> > >> Er, the first row "under 16" has a count of 1 in the "True" column and >> it confuses you that you get an error saying that you need at least 2?? >> >> But what looks _really_ confused is what you are trying to do in the >> first place: The p's you are passing to pwr.2p2n are the empirical >> relative frequencies of the individual age groups. This sort of reverses >> cause and effect (presumably the exposure does not cause middle age) and >> it is pretty odd to compare a particular row in a table with everything >> else jumbled together but worse, it is post-hoc power calculation, which >> is just a plain Bad Idea (as several people have pointed out before). >> >> -- >> O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K >> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 >> ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 >> >> >> > > > -- > Michael Pearmain > Senior Statistical Analyst > > > 1st Floor, 180 Great Portland St. London W1W 5QZ > t +44 (0) 2032191684 > [EMAIL PROTECTED] > [EMAIL PROTECTED] > > > Doubleclick is a part of the Google group of companies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Deleting multiple variables
Hi All, i have searched the web for a simple solution but have been unable to find one. Can anyone recommend a neat way of deleting multiple variable? I see, i need to use dataframe$VAR<-NULL to get rid of one variable, In my situation i need to delete all vars between two points. I've used the 'which' function to find these out and have assigned to myvar >myvars [1] 2 17 but i can't figure out how i should apply this? Should i loop through the values? (Psydo code below?) for (x in c(myvars[1]:myvars[2])) (M_UC$x<-NULL)) Any help gratful Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Contional
Hi All, I'm having trouble selecting rows to delete, that i can't seem to overcome. Below is some sample data, i am trying to dedup the data based on each user, and simultaneously the timestamp (at the side i have highlighted expected row to be removed) I've looked at the lag function but can't seem to make it work? My logic ran along the lines of an ifelse statement and then remove after that, but it doesn't seem to work? Any help appreciated Let's call the data test test$lag <- ifelse(test$user_id==lag(test$user_id) & test$timestamp==lag(test$timestamp),1,0) Can anyone help on this? Mike Source_type timestampuser_id 75381 0 07-07-2008-21:03:55 848307909687 75379 1 07-07-2008-19:52:55 848307838407 75380 2 07-07-2008-19:54:14 848307838407 75378 1 07-07-2008-15:24:01 848285633277 75374 1 07-07-2008-13:39:17 848273633667 75377 2 07-07-2008-13:39:55 848273633667 75376 2 07-07-2008-13:39:55 848273633667Remove 75375 2 07-07-2008-13:56:05 848273633667 75373 1 07-07-2008-17:11:00 848272661427 75371 1 07-07-2008-13:19:00 848270431847 75372 2 07-07-2008-13:19:14 848270431847 75369 1 07-07-2008-12:49:16 848269676907 Remove 75370 2 07-07-2008-12:49:16 848269676907 75366 1 07-07-2008-13:29:15 848263484847 75368 2 07-07-2008-13:29:44 848263484847 Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with aggregation
Hi All, I seem to be having a few troubles with aggregating data back onto the the dataframe, I want to take the max value of a user, and then apply this max value back against all id's that match (i.e a one to many matching) Can anyone offer any advice? is there a better way of doing this? Dummy data and code are listed below:- dataset is called Mcookie user_idc_we_conversion 1 1 1 0 1 0 2 1 2 1 3 0 3 0 new data user_idc_we_conversionc_we_conversion 1 1 1 1 0 1 1 0 1 2 1 1 2 1 1 3 0 0 3 0 0 library(Hmisc) myAgg<-summarize(Mcookie$c_we_conversion, by=Mcookie$user_id, FUN=max, na.rm=TRUE) names(myAgg)<- c("user_id","c_we_converter") Mcookie <- merge(Mcookie, myAgg, by.x = "user_id", by.y = "user_id") Thanks in advance, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time conversion
I'm trying to convert a variable that is imported from CSV into a datetime,I'm trying to use the strptime function but with no joy, can anyone offer any advise? i have a vector timestamp 07-07-2008-21:03:55 07-07-2008-19:52:55 07-07-2008-19:54:14 07-07-2008-15:24:01 07-07-2008-13:39:17 07-07-2008-13:39:55 timestamp<-strptime(timestamp,"%d-%m-%y-%H:%M:%S") ## then filter on the datetime time<-ifelse(timestamp> "07-08-2008-00:00:00", TRUE, FALSE) -- Michael Pearmain Senior Statistical Analyst Google UK Ltd Belgrave House 76 Buckingham Palace Road London SW1W 9TQ United Kingdom t +44 (0) 2032191684 [EMAIL PROTECTED] If you received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Timestamps and manipulations
Hi All, I've a couple of questions i've been struggling with using the time features, can anyone help? sample data Timestampuser_id 27/05/08 22:57 763830873067 27/05/08 23:00 763830873067 27/05/08 23:01 763830873067 27/05/08 23:01 763830873067 05/06/08 11:34 763830873067 29/05/08 23:08 765253440317 29/05/08 23:06 765253440317 29/05/08 22:52 765253440317 29/05/08 22:52 765253440317 29/05/08 23:04 765253440317 27/06/08 19:34 765253440317 09/07/08 15:45 765329002557 06/07/08 19:24 765329002557 09/07/08 15:46 765329002557 07/07/08 13:05 765329002557 16/05/08 22:40 765329002557 08/06/08 11:24 765329002557 08/06/08 12:33 765329002557 My first question is how can i create a new var creating a filter based on a date? I've tried as.POSIXct.strptime below as well but to no avail.. can anyone give any advice? >Mcookie$timestamp <- as.POSIXct(strptime(Mcookie$timestamp,"%m/%d/%Y %H:%M")) >Mcookie$time <- ifelse(Mcookie$timestamp > strptime("07-08-2008-00:00","%m-%d-%Y-%H:%M",1,0) My second questions refers to finding the time difference in seconds between the first time a user sees something Vs the last.. and engagment time essentially, i see there is the difftime function, is there a more elegant way of working this out then my thoughts (Pysdo code below) sort data by user_id and Timestamp take the head of user_id as new_time_var take the tail of user_id as new_time_var2 use difftime(new_time_var, new_time_var2, units="secs") Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing with lead function
Hi All, I've been trying to compare if the previous value in a variable is equal to a binary value..(i.e i want to check if the last event was a yes or no) i've been trying to write some code for this, but it seems overly elaborate, can anyone suggest a better / shorter / neater way? The below doesn't quite work but shows my idea of splitting by the factor id, then creating a new vector that is lead, then i was going to use an ifelse clause.. But as i suggested this seem very elaborate.. my sample code below DF <- read.table(textConnection("timestamp; id 2008-05-27 22:57:00; 763830873067 2008-05-27 23:00:00; 763830873067 2008-05-27 23:01:00; 763830873067 2008-05-27 23:01:00; 763830873067 2008-06-05 11:34:00; 763830873067 2008-05-29 23:08:00; 765253440317 2008-05-29 23:06:00; 765253440317 2008-05-29 22:52:00; 765253440317 2008-05-29 22:52:00; 765253440317 2008-05-29 23:04:00; 765253440317 2008-06-27 19:34:00; 765253440317 2008-07-09 15:45:00; 765329002557 2008-07-06 19:24:00; 765329002557 2008-07-09 15:46:00; 765329002557 2008-07-07 13:05:00; 765329002557 2008-05-16 22:40:00; 765329002557 2008-06-08 11:24:00; 765329002557 2008-06-08 12:33:00; 765329002557"),as.is =TRUE,sep=";",strip.white=TRUE,header=TRUE) closeAllConnections() DF$time <- ifelse(DF$timestamp > as.POSIXct("2008-07-01"), 1, 0) last_event <- lapply(split(test, test$ID), function(.df){ lead_func_temp <- c(NA,.df$TIME [ - length(.df$TIME)]) temp <- data.frame(ID=as.character(.df$ID),TIME=.df$TIME, DIFF=rep(lead_func_temp,nrow(.df))) return(temp) }) DF$last_event <- do.call(rbind, last_event) Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Paste in a FOR loop
Hi All, I've been having a little trouble using R2HTML and a loop, but can't figure out where the problem lies, any hints gratefully received. My code at the minute, (Which does work) is in the following: library(R2HTML) HTMLStart(outdir = file.path("C://Example_work","R_projects","Dynamic_creative"),filename = "RMDC_mockup",Title="Mock up for RMDC") summary(z.out.1) summary(s.out.1) hist(s.out.1$qi$ev) HTMLplot() . . . summary(z.out.3) summary(s.out.3) hist(s.out.3$qi$ev) HTMLplot() HTMLStop() This seemed a rather long winded way of doing things to me and a simple for loop should handle this, as later i want it to be dynamic for a number of groups so my new code is(not working): library(R2HTML) HTMLStart(outdir = file.path("C://Example_work","R_projects","Dynamic_creative"),filename = "RMDC_mockup",Title="Mock up for RMDC") for(group in 1:3){ paste("summary(z.out.", group, sep = "") paste("summary(s.out.", group, sep = "") paste("s.out.",group,"$qi$ev", sep = "") HTMLplot() } HTMLStop() Which returns the error Error in dev.print(device = png, file = AbsGraphFileName, width = Width, : no device to print from Can anyone offer some advise here? Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] t.test in a loop
Hi All, I've been having a little trouble with creating a loop that will run a a series of t.tests for inspection, Below is the code i've tried, and some checks i've looked at. I've used the get(paste()) idea as i was told previously that the use of the eval should try and be avoided. I've run a single syntax to check that my systax is correct and works without any problems > t.test(channel.data.train$News~channel.data.train$power) Can anyone offer any advice? Many thanks Mike > str(channel.data.train$power) num [1:9913] 0 0 0 0 0 0 0 0 0 0 ... > summary(channel.data.train$power) Min. 1st Qu. MedianMean 3rd Qu.Max. 0. 0. 0. 0.2368 0. 1. > names(channel.data.train) [1] "News" "Entertainment" "Communicate" [4] "Lifestyle" "Games" "Music" [7] "Money" "Celebrity" "Shopping" [10] "Sport" "Film" "Travel" [13] "Cars" "Property" "Chat" [16] "Bet.Play.Win" "config""exposed" [19] "site" "referrer" "started" [22] "last_viewed" "num_views" "secs_since_viewed" [25] "register" "secs.na" "power" [28] "tt" > for(i in names(channel.data.train[,c(1:16)])){ + t.test(get(paste("channel.data.train$",i,"~channel.data.train$power",sep=""))) + } Error in get(paste("channel.data.train$", i, "~channel.data.train$power", : variable "channel.data.train$News~channel.data.train$power" was not found -- Michael Pearmain Senior Analytics Research Specialist Google UK Ltd Belgrave House 76 Buckingham Palace Road London SW1W 9TQ United Kingdom t +44 (0) 2032191684 mpearm...@google.com If you received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Forecasting with dlm
Hi All, I have a problem trying to forecast using the dlm package, can anyone offer any advise? I setup my problem as follows, (following the manual as much as possible) data for example to run code CostUSD <- c(27.24031,32.97051, 38.72474, 22.78394, 28.58938, 49.85973, 42.93949, 35.92468) library(dlm) buildFun <- function(x) { dlmModPoly(1, dV = exp(x[1]), dW = exp(x[2])) } fit <- dlmMLE(CostUSD, parm = c(0,0), build = buildFun) fit$conv dlmCostUSD <- buildFun(fit$par) V(dlmCostUSD) W(dlmCostUSD) #For comparison StructTS(CostUSD, "level") CostUSDFilt <- dlmFilter(CostUSD, dlmCostUSD) CostUSDFore <- dlmForecast(CostUSDFilt, nAhead = 1) after which i return the error message: Error in mod$m[lastObsIndex, ] : incorrect number of dimensions Can anyone offer any insight to this problem? Thanks in advance Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loop swith String replacement
Hi All, I'm trying to split my dataset, into multiple datasets that i'll analyse later, i wanted to do this dynamically as i might need to rerun the code later. I was looking at doing this via a loop, (Are other methods more appropriate? Would a function be better?) However i'm not sure in R how do do string replacement within the loop in order to create unique dataset names based on the number of 'Groups' i have Many thanks Mike my code is as follows: no.groups <-names(table(Conv$Group)) for (i in length(Conv$no.groups)) { groupi <- subset(Conv, Conv$Group == i) } Metal Secs cost Income stable Group 1 Chrome 6014 3.3458 1 2 2 Chrome 5110 1.8561 0 1 3 Chrome 2412 0.6304 0 1 4 Chrome 38 8 3.4183 1 2 5 Chrome 2512 2.7852 1 3 6 Chrome 6712 2.3866 1 1 7 Chrome 4012 4.2857 0 1 8 Chrome 5610 9.3205 1 1 9 Chrome 3212 3.8797 1 3 10 Chrome 7516 2.7031 1 3 11 Chrome 4615 11.2307 1 2 12 Chrome 5212 8.6696 1 2 13 Chrome 2212 1.7443 0 2 14 Chrome 6012 0.2253 0 2 15 Chrome 2414 4.3348 1 3 -- Michael Pearmain Senior Analytics Research Specialist Google UK Ltd Belgrave House 76 Buckingham Palace Road London SW1W 9TQ United Kingdom t +44 (0) 2032191684 [EMAIL PROTECTED] If you received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.