[R] Aggregation across two variables in data.table
Dear all, I have a data.frame that includes a series of demographic variables for a set of respondents plus a dependent variable (Theta). For example: AgeEducation Marital Familysize IncomeHousingTheta 1: 50 Associate degree Divorced 4 70K+Owned with mortgage 9.14 2: 65 Bachelor degree Married 1 10-15K Owned without mortgage 7.345036 3: 33 Bachelor degree Married 2 30-40KOwned with mortgage 7.974937 4: 69 Bachelor degree Never married 1 70K+Owned with mortgage 7.733053 5: 54 Some college, less than college graduate Never married 3 30-40K Rented 7.648642 6: 35 Associate degree Separated 2 10-15K Rented 7.496411 My objective is to calculate the average of Theta across all pairs of two demographics. For 1 demographic this is straightforward: Demo_names <- c("Age", "Education", "Marital", "Familysize", "Income", "Housing") means1 <- as.list(rep(0, length(Demo_names))) for (i in 1:length(Demo_names)) { Demo_tmp <- Demo_names[i] means1[[i]] <- data_tmp[,list(mean(Theta)),by=Demo_tmp]} Is there an easy way to extent this logic to more than 1 variable? I know how to do this manually, e.g., data_tmp[,list(mean(Theta)),by=list(Marital, Education)] But I don't know how to integrate this into a loop. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Speeding up npreg
Dear all, I am using npreg from the np library to run a Kernel regression. My dataset is relatively large and has about 3000 observations. The dependent variable is continuous and I have a total of six independent variables -- two continuous, two ordinal and two categorical. The model converges without problems but it takes a very long time to do so (nearly one hour). Is there any way to speed up the npreg function to decrease the running time? Or is there another function/ package for Kernel regression that may be faster? Any advice would be much appreciated Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fractional Factorial Design on 4-level factor
Dear all, I am running a simulation experiment with 8 factors that each have 4 levels. Each combination is repeated 100 times. If I run a full factorial this would mean 100*8^4 = 409,600 runs. I am trying to reduce the number of scenarios to run using a fractional factorial design. I'm interested in estimating the main effects of the 8 factors plus their 2-way interactions. Any higher level interactions are not of interest to me. My plan is to use a standard OLS regression for that, once the simulations are over. I tried to use the FrF2 package to derive a fractional factorial design but it seems that this is only working for factors on two levels. Any idea how I could derive a fractional factorial design on factors with four levels? Thanks for your help, Michael Michael Haenlein Professor of Marketing ESCP Europe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lm model with many categorical variables
Dear all, I am trying to estimate a lm model with one continuous dependent variable and 11 independent variables that are all categorical, some of which have many categories (several dozens in some cases). I am not interested in statistical inference to a larger population. The objective of my model is to find a way to best predict my continuous variable within the sample. When I run the lm model I evidently get many regression coefficients that are not significant. Is there some way to automatically combine levels of a categorical variable together if the regression coefficients for the individual levels are not significant? My idea is to find some form of grouping of the different categories that allows me to work with less levels while keeping or even improving the quality of predictions. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Teaching materials for R course
Dear all, I am Professor at a business school and I would like to develop a course about quantitative research using R. My current plan is that the course should cover (a) an introduction (assuming that students have never used R before), (b) basic econometric analysis (e.g., regression, logit) as well as (c) structural equation modelling. Are there any textbooks and teaching materials (e.g., PowerPoint slides) that one of you could recommend for me to have a look at? Thanks, Michael Michael Haenlein Professor of Marketing ESCP Europe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Running R Remotely on LINUX
Dear all, I am used to running R locally on my Windows-based PC. Since some of my computations are taking a lot of time I am now trying to move to a remote R session on a LINUX server but I am having trouble to getting things work. I am able to access the LINUX server using PuTTY and SSH. Once I have access I can log in with my username and password (which is asked through keyboard-interactive authentication). I can then open an R session. Since I am not used to working with LINUX, I have several questions: (1) Ideally I am looking for a Windows-based software that would allow me to work on R as I am used to with the difference that the computations are run remotely on the LINUX server. Does a software like this exist? Please note that I do not think that I can install any software on the LINUX server. But I can install stuff on my Windows-based PC. (2) I am running an extensive simulation that takes about one week to run. Right now it seems that when I log out of R on LINUX and close PuTTY, the R session closes as well. Is there a way to let R run in the background for the week and just check into the progress 1-2 times a day? (3) Can I open several instances of R in parallel? On my PC I sometimes have 2-3 windows open in parallel that work on different calculations to save time. Not sure to which extent this is possible on LINUX. I assume that this questions are very naïve. But since I’m only used to working with Windows I’m quite stuck at the moment. Any help would be very appreciated! Thanks in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Instagram Analysis
Dear all, I'm looking for an R package that allows me to analyze Instagram. Specifically I would like to download for a given account the list of other accounts that either this account follows or that follow this account (the followers and following numbers). I know there is instaR but this package is quite old (August 2016) and seems not to have been updated in the meantime. Is there a new package or any other way to get this information in an easy way? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bayesian multivariate linear regression
Dear all, I'm looking for a package that allows me to run a Bayesian multivariate linear regression and extract predicted values. In essence I'm looking for the equivalent of lm and lm.predict in a Bayesian framework. I have found several libraries that allow to run Bayesian multivariate linear regression (e.g., bayesm), but those do not seem to have a prediction function. And the ones with prediction (e.g., MCMCPack) do not support multiple dependent variables. If you have any pointers, please let me know. Best wishes, Michael Michael Haenlein ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] igraph -- Selecting closest neighbors
Dear all, I am looking for a function to select the N closest neighbors (in terms of distance) of a vertex in igraph. Assume for example N=7. If the vertex has 3 direct neighbors, I would like that the function selects those 3 plus a random 4 among the second degree neighbors. Is there some way to do this in an efficient way? I have been trying to program something using ego () with varying levels of distance but I have not managed to get a conclusive solution. Thanks for your help, Michael Michael Haenlein Professor of Marketing ESCP Europe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Social Network Simulation
Dear all, I am trying to simulate a series of networks that have characteristics similar to real life social networks. Specifically I am interested in networks that have (a) a reasonable degree of clustering (as measured by the transitivity function in igraph) and (b) a reasonable degree of degree polarization (as measured by the average degree of the top 10% nodes with highest degree divided by the overall average degree). Right now I am using two functions from irgaph (sample_pa and sample_smallworld) but these are not ideal since they only allow me to vary one of the two characteristics. Either the network has good clustering but not enough polarization or the other way round. I looked around and I found some network algorithms that solve the problem (E.g., Jackson and Rogers, Meeting Strangers and Friends of Friends), but I did not find their implemented in an R package. I also found the R package NetSim which seems to be in this spirit, but I cannot get it to work. Could anyone point me to an R library that I could check out? I do not care much about the specific algorithm used as long as it allows me to vary clustering and degree polarization in certain ranges. Thanks, Michael Michael Haenlein Professor of Marketing ESCP Europe, Paris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (Small) programming job related to network analysis
Dear all, I am looking for help in programming three functions. Those functions should simulate (social) networks according to the process described in : (1) A.H. Dekker - "Realistic Social Networks for Simulation using Network Rewiring" ( http://www.mssanz.org.au/MODSIM07/papers/13_s20/RealisticSocial_s20_Dekker_.pdf ) (2) Konstantin Klemm and Vıctor M. Eguıluz - "Growing scale-free networks with small-world behavior" (http://ifisc.uib-csic.es/victor/Nets/sw.pdf) (3) Petter Holme and Beom Jun Kim - "Growing Scale-Free Networks with Tunable Clustering" (http://arxiv.org/pdf/cond-mat/0110452.pdf) I am looking for three functions (e.g., sample_dekker, sample_klemm, sample_holme) that generate an output similar to the functions sample_pa and sample_smallworld in the R package igraph. The input should be the number of nodes in the network (e.g., 1000) and any other parameters those models require. In case this is of relevance please get in touch with me by email to discuss further details. Thanks, Michael Michael Haenlein Professor of Marketing ESCP Europe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulate data from Structural Equation Model
Dear all, I am looking for an R package or code that allows me to simulate data consistent with a given structural equation model. Essentially my idea is to define (a) the number of endogenous and exogenous latent variables, (b) the strength of relationship between them and (c) the way of measurement (number of indicators, distribution of indicators) and to obtain simulated data consistent with this specification. I know there is some literature on this topic (e.g., Mattson, S. (1997). How to generate non-normal data for simulation of structural equation models. Multivariate behavioral research, 32(4), 355 – 373), but I do not know whether some of these approaches have already been implanted in R and/ or whether better methods exist. Any help would be very much appreciated, Thanks, Michael Michael Haenlein Professor of Marketing ESCP Europe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help for programming a short code in R
Dear all, I'm looking for a person who could help me to program a short code in R. The code involves Bayesian analysis so some familiarity with WinBUGS or another package/ software dealing with Bayesian estimation would be helpful. I have an academic paper in which the code is described ("Abe, M. (2009), ""Counting your customers" one by one: A hierarchical Bayes extension to the Pareto/NBD model," Marketing science, Vol. 28 No. 3, pp. 541 - 53") as well as one of the datasets mentioned in this manuscript to test the code. My assumption is that the job does not take very long -- although I cannot give a precise estimate of the number of hours required. If anyone is interested, please let me know and I can send you an electronic copy of the manuscript mentioned above. Best, Michael Michael Haenlein Professor of Marketing ESCP Europe - The School of Management for Europe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert character string to top levels + NAN
Dear all, I have several character strings with a high number of different levels. unique(x) gives me values in the range of 100-200. This creates problems as I would like to use them as predictors in a coxph model. I therefore would like to convert each of these strings to a new string (x_new). x_new should be equal to x for the top n categories (i.e. the top n levels with the highest occurrence) and NAN elsewhere. For example, for n=3 x_new would have three levels: The three most common levels of x + NAN. Is there some convenient way of doing this? Thanks in advance, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interpreting output of coxph with frailty.gamma
Dear all, this is probably a very silly question, but could anyone tell me what the different parameters in a coxph model with a frailty.gamma term mean? Specifically I have two questions: (1) Compared to a "normal" coxph model, it seems that I obtain two standard errors [se(coef) and se2]. What is the difference between those? (2) Again compared to a "normal" coxph model, the z/p-values are replaced by a chi-squared test (Chisq, DF, p). What is the reason for this? Does a standard z-test not work once a frailty term is included? Thanks very much for your help in advance, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Approximating discrete distribution by continuous distribution
Dear all, I have a discrete distribution showing how age is distributed across a population using a certain set of bands: Age <- matrix(c(74045062, 71978405, 122718362, 40489415), ncol=1, dimnames=list(c("<18", "18-34", "35-64", "65+"),c())) Age_dist <- Age/sum(Age) For example I know that 23.94% of all people are between 0-18 years, 23.28% between 18-34 years and so forth. I would like to find a continuous approximation of this discrete distribution in order to estimate the probability that a person is for example 16 years old. Is there some automatic way in R through which this can be done? I tried a Kernel density estimation of the histogram but this does not seem to provide what I'm looking for. Thanks very much for your help, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict.lm
Dear all, I would like to use predict.lm to obtain a set of predicted values based on a regression model I estimated. When I apply predict.lm to two vectors that have the same values, the predicted values will be identical. I know that my regression model is not perfect and I would like to take account of the error inherent in the model within my predictions. So, while I understand that the expected value of both vectors should be the same (since they have the same value), I would like to have different predictions to take account of the error inherent in my model. I assume I can probably use se.fit to achieve my objective of including "random error" in my predictions but I don't really know how. Could anybody give me a pointer on how this can be done? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lp.transport in package lpSolve
Dear all, I'm working on a very complex linear optimization problem using the lp.transport function in lpSolve. My PC has 10 cores, but by default R uses only one of them. Is there a straightforward way to make lp.transport use all cores available? I had a look at "High-performance and parallel computing in R" ( http://cran.r-project.org/web/views/HighPerformanceComputing.html), but I have the impression that using multiple cores would require me to change the function underlying lp.transport. The problem is that I'm not able whether I'm able to make those adjustments. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R in remote mode
Dear all, I have written a simulation in R that has a significant running time (probably 60-80 hours). While I can run the code on my laptop, it tends to slow things down to a significant extent and it leads to a very high CPU temperature overall. Is there an easy and convenient way to run R remotely on some outside server or PC? Any services that you are aware off? I know that there is a way to run R on Amazon EC but I'm wondering whether there is something even simpler. Ideally I am looking for a remote access to a PC where R is already installed and where I can simply copy-paste my code and run it. Please let me know in case you have any ideas, Thanks in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boxcox alternative
Dear all, I am working with a set of variables that are very non-normally distributed. To improve the performance of my model, I'm currently applying a boxcox transformation to them. While this improves things, the performance is still not great. So my question: Are there any alternatives to boxcox in R? I would need a model that estimates the "best" transformation automatically without input from the user since my approach should be flexible enough to deal with any kind of distribution. boxcox allows me to do this by picking the lambda that leads to the "best fit" but I wonder whether there are other options out there. Thanks, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Looking for consultant in mathematics/ statistics
Dear all, I am looking for a consultant who can help me to solve a mathematical/ statistical problem I have. The problem is more conceptual in nature (How to solve a given problem analytically) than programming-related. Although I also would need some programming support later, once the analytic solution has been found. My question relates to categorical variables that are represented by underlying latent variables. If you think you could help, please send me an email. I will then describe the problem in more detail and we can agree on fees and timelines. My gut feeling is that it will not take long (like a couple of hours), but I might be wrong. Looking forward to hearing from you, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert Mathematica code into R
Dear all, I have a reasonably short piece of code written in Mathematica 5.2 which I would like to convert to R. The problem is that I'm not familiar with Mathematica. I would, however, also be OK with some interface that allows me to run Mathematica from within R and use the output of the Mathematica for further analysis within R. Any advice on how to conveniently convert the code or on how to run Mathematica from within R? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Batch file export
Dear all, I have a code that generates data vectors within R. For example assume: z <- rlnorm(1000, meanlog = 0, sdlog = 1) Every time a vector has been generated I would like to export it into a csv file. So my idea is something as follows: for (i in 1:100) { z <- rlnorm(1000, meanlog = 0, sdlog = 1) write.csv(z, "c:/z_i.csv") Where "z_i.csv" is a filename that is related to the run (e.g. z_001.csv, z_002.csv, ...). Could anyone please advice me on the most convenient way of doing this? Thanks very much in advance, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Printing status updates in while-loop
Dear all, I'm using a while loop in the context of an iterative optimization procedure. Within my while loop I have a counter variable that helps me to determine how long the loop has been running. Before the loop I initialize it as counter <- 0 and the last condition within my loop is counter <- counter + 1. I'd like to print out the current status of "counter" while the loop is running to know where the optimization routine is standing. I tried to do so by adding print(counter) within the while loop. This does however not seem to work as instead of printing regular updates all print commands are executed only after the loop is finished. Is there some easy way to print regular status updates while the while loop is still running? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Import graph object
Dear all, I have a txt file of the following format that describes the relationships between a network of a certain number of nodes. {4, 2, 3} {3, 4, 1} {4, 2, 1} {2, 1, 3} {2, 3} {} {2, 5, 1} {3, 5, 4} {3, 4} {2, 5, 3} For example the first line {4, 2, 3} implies that there is a connection between Node 1 and Node 4, a connection between Node 1 and Node 2 and a connection between Node 1 and Node 3. The second line {3, 4, 1} implies that there is a connection between Node 2 and Node 3 as well as Node 4 and Node 1. Note that some of the nodes can be isolated (i.e., not have any connections to any other node) which is then indicated by {}. Also note that the elements in each row are not necessarily ordered (i.e., {4, 2, 3} instead of {2, 3, 4}). I would like to (a) read the txt file into R and (b) convert it to an adjacency matrix. For example the adjacency matrix corresponding to the aforementioned example is as follows: 0 1 1 1 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 Is there any convenient way of doing this? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Consultant for Mathematica -> R translation
Dear all, I have a very short code written in Mathematica which I would need to get translated for use in R. I'm not an expert in Mathematica (which is why I would not feel comfortable with doing the translation myself), but the code is very short (probably 30-40 lines) and looks quite simple from my perspective. Anyone who would be interested in taking over this job, please get in touch with me so that we can agree on terms & conditions. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Equivalent to go-to statement
Dear all, I'm working with a code that consists of two parts: In Part 1 I'm generating a random graph using the igraph library (which represents the relationships between different nodes) and a vector (which represents a certain characteristic for each node): library(igraph) g <- watts.strogatz.game(1,100,5,0.05) z <- rlnorm(100,0,1) In Part 2 I'm iteratively changing the elements of z in order to reach a certain value of a certain target variable. I'm doing this using a while statement: while (target_variable < threshold) {## adapt z} The problem is that in some rare cases this iterative procedure can take very long (a couple of million of iterations), depending on the specific structure of the graph generated in Part 1. I therefore would like to change Part 2 of my code in the sense that once a certain threshold number of iterations has been achieved, the iterative process in Part 2 stops and goes back to Part 1 to generate a new graph structure. So my idea is as follows: - Run Part 1 and generate g and z - Run Part 2 and iteratively modify z to maximize the target variable - If Part 2 can be obtained in less than X steps, then go to Part 3 - If Part 2 takes more than X steps then go back to Part 1 and start again I think that R does not have a function like "go-to" or "go-back". Does anybody know of a convenient way of doing this? Thanks very much for your help, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time-dependent covarites in survreg function
Dear all, I'm doing a survival analysis with time-dependent covariates. Until now, I have used a simple Cox model for this, specifically the coxph function from the survival library. Now, I would like to try out an accelerated failure time model with a parametric specification as implemented for example in the survreg function. Two questions: First, can survreg handle time-dependent covariates? The description for this function does not make reference to them. And second, in case survreg cannot deal with time-dependent covariates, is there a similar function in some other package that can? Thanks very much, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict.coxph and predict.survreg
Dear all, I'm struggling with predicting "expected time until death" for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an "expected time until death" for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document "A Package for Survival Analysis in S" written by Terry M. Therneau but I have to admit that I'm a bit lost here. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
Thanks very much for your answers, David and Mattia. I understand that the baseline hazard in a Cox model is unknown and that this makes the calculation of expected survival difficult. Does this change when I move to a survreg model instead? I think I'm OK with estimating a Cox model (or a survreg model) as I've done so in the past. But I'm lost with the different options in the prediction part (e.g., linear, quantile, risk, expected, ...). Is there any document that can provide an explanation what these options mean? Sorry in case these questions are naive ... hope they're not too stupd ;-) On Thu, Nov 11, 2010 at 5:03 PM, Mattia Prosperi wrote: > Indeed, from the predict() function of the coxph you cannot get > directly "time" predictions, but only linear and exponential risk > scores. This is because, in order to get the time, a baseline hazard > has to be computed and it is not straightforward since it is implicit > in the Cox model. > > 2010/11/11 David Winsemius : > > > > On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote: > > > >> Dear all, > >> > >> I'm struggling with predicting "expected time until death" for a coxph > and > >> survreg model. > >> > >> I have two datasets. Dataset 1 includes a certain number of people for > >> which > >> I know a vector of covariates (age, gender, etc.) and their event times > >> (i.e., I know whether they have died and when if death occurred prior to > >> the > >> end of the observation period). Dataset 2 includes another set of people > >> for > >> which I only have the covariate vector. I would like to use Dataset 1 to > >> calibrate either a coxph or survreg model and then use this model to > >> determine an "expected time until death" for the individuals in Dataset > 2. > >> For example, I would like to know when a person in Dataset 2 will die, > >> given > >> his/ her age and gender. > >> > >> I checked predict.coxph and predict.survreg as well as the document "A > >> Package for Survival Analysis in S" written by Terry M. Therneau but I > >> have > >> to admit that I'm a bit lost here. > > > > The first step would be creating a Surv-object, followed by running a > > regression that created a coxph-object, using dataset1 as input. So you > > should be looking at: > > > > ?Surv > > ?coxph > > > > There are worked examples in the help pages. You would then run predict() > on > > the coxph fit with "dataset2" as the newdata argument. The default output > is > > the linear predictor for the log-hazard relative to a mean survival > estimate > > but other sorts of estimates are possible. The survfit function provides > > survival curve suitable for plotting. > > > > (You may want to inquire at a local medical school to find statisticians > who > > have experience with this approach. This is ordinary biostatistics these > > days.) > > > > -- > > David. > > > >> > >> Could anyone give me some advice on how this could be done? > >> > >> Thanks very much in advance, > >> > >> Michael > >> > >> > >> > >> Michael Haenlein > >> Professor of Marketing > >> ESCP Europe > >> Paris, France > > > > David Winsemius, MD > > West Hartford, CT > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
Thanks for the comment, James! The problem is that my initial sample (Dataset 1) is truncated. That means I only observe "time to death" for those individuals who actually died before end of my observation period. It is my understanding that this type of truncation creates a bias when I use a "normal" regression analysis. Hence my idea to use some form of survival model. I had another look at predict.survreg and I think the option "response" could work for me. When I run the following code I get ptime = 290.3648. I assume this means that an individual with ph.ecog=2 can be expected to life another 290.3648 days before death occurs [days is the time scale of the time variable). Could someone confirm whether this makes sense? lfit <- survreg(Surv(time, status) ~ ph.ecog, data=lung) ptime <- predict(lfit, newdata=data.frame(ph.ecog=2), type='response') On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger wrote: > Michael, > > You are looking to compute an estimated time to death -- rather than the > odds of death conditional upon time. Thus, you will want to use "time to > death" as your dependent variable rather than a dichotomous outcome ( > 0=alive, 1=death). You can accomplish this with a straight forward > regression analysis. > > Best, > > Jim > > On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein > wrote: > >> Dear all, >> >> I'm struggling with predicting "expected time until death" for a coxph and >> survreg model. >> >> I have two datasets. Dataset 1 includes a certain number of people for >> which >> I know a vector of covariates (age, gender, etc.) and their event times >> (i.e., I know whether they have died and when if death occurred prior to >> the >> end of the observation period). Dataset 2 includes another set of people >> for >> which I only have the covariate vector. I would like to use Dataset 1 to >> calibrate either a coxph or survreg model and then use this model to >> determine an "expected time until death" for the individuals in Dataset 2. >> For example, I would like to know when a person in Dataset 2 will die, >> given >> his/ her age and gender. >> >> I checked predict.coxph and predict.survreg as well as the document "A >> Package for Survival Analysis in S" written by Terry M. Therneau but I >> have >> to admit that I'm a bit lost here. >> >> Could anyone give me some advice on how this could be done? >> >> Thanks very much in advance, >> >> Michael >> >> >> >> Michael Haenlein >> Professor of Marketing >> ESCP Europe >> Paris, France >> >>[[alternative HTML version deleted]] >> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > *James C. Whanger > Research Consultant > 2 Wolf Ridge Gap > Ledyard, CT 06339 > > Phone: 860.389.0414* > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
David, Mattia, James -- thanks so much for all your helpful comments! I now have a much better understanding of how to calculate what I'm interested in ... and what the risks are of doing so. Thanks and all the best, Michael On Thu, Nov 11, 2010 at 7:33 PM, David Winsemius wrote: > > On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote: > > Thanks for the comment, James! >> >> The problem is that my initial sample (Dataset 1) is truncated. That means >> I >> only observe "time to death" for those individuals who actually died >> before >> end of my observation period. It is my understanding that this type of >> truncation creates a bias when I use a "normal" regression analysis. Hence >> my idea to use some form of survival model. >> >> I had another look at predict.survreg and I think the option "response" >> could work for me. >> When I run the following code I get ptime = 290.3648. >> I assume this means that an individual with ph.ecog=2 can be expected to >> life another 290.3648 days before death occurs [days is the time scale of >> the time variable). >> > > It is a prediction under specific assumptions underpinning a parametric > estimate. > > > Could someone confirm whether this makes sense? >> > > You ought to confirm that it "makes sense" by comparing to your data: > reauire(Hmisc); require(survival) > > > > describe(lung[lung$status==1&lung$ph.ecog==2,"time"]) > lung[lung$status == 1 & lung$ph.ecog == 2, "time"] > n missing uniqueMean > 6 0 6 293.7 > > 92 105 211 292 511 551 > Frequency 1 1 1 1 1 1 > % 17 17 17 17 17 17 > > > ?lung > > So status==1 is a censored case and the observed times are status==2 > > describe(lung[lung$status==2&lung$ph.ecog==2,"time"]) > lung[lung$status == 2 & lung$ph.ecog == 2, "time"] > n missing uniqueMean .05 .10 .25 .50 .75 > .90 .95 > 44 1 44 226.0 14.95 36.90 94.50 178.50 295.75 > 500.00 635.85 > > lowest : 11 12 13 26 30, highest: 524 533 654 707 814 > > And the mean time to death (in a group that had only 6 censored individual > at times from 92 to 551) was 226 and median time to death among 44 > individuals is 178 with a right skewed distribution. You need to decide > whether you want to make that particular prediction when you know that you > forced a specific distributional form on the regression machinery by > accepting the default. > > > > >> lfit <- survreg(Surv(time, status) ~ ph.ecog, data=lung) >> ptime <- predict(lfit, newdata=data.frame(ph.ecog=2), type='response') >> >> >> >> On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger >> wrote: >> >> Michael, >>> >>> You are looking to compute an estimated time to death -- rather than the >>> odds of death conditional upon time. Thus, you will want to use "time to >>> death" as your dependent variable rather than a dichotomous outcome ( >>> 0=alive, 1=death). You can accomplish this with a straight forward >>> regression analysis. >>> >>> Best, >>> >>> Jim >>> >>> On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein < >>> haenl...@escpeurope.eu>wrote: >>> >>> Dear all, >>>> >>>> I'm struggling with predicting "expected time until death" for a coxph >>>> and >>>> survreg model. >>>> >>>> I have two datasets. Dataset 1 includes a certain number of people for >>>> which >>>> I know a vector of covariates (age, gender, etc.) and their event times >>>> (i.e., I know whether they have died and when if death occurred prior to >>>> the >>>> end of the observation period). Dataset 2 includes another set of people >>>> for >>>> which I only have the covariate vector. I would like to use Dataset 1 to >>>> calibrate either a coxph or survreg model and then use this model to >>>> determine an "expected time until death" for the individuals in Dataset >>>> 2. >>>> For example, I would like to know when a person in Dataset 2 will die, >>>> given >>>> his/ her age and gender. >>>> >>>> I checked predict.coxph and predict.survreg as well as the document "A >>>> Package for Survival Analysis in S" written by Terry M. Therneau but I >>>> have >>>> to admit that I'm a bit lost here. >>>> >>>> Could anyone give me some advice on how this could be done? >>>> >>>> Thanks very much in advance, >>>> >>>> Michael >>>> >>>> >>>> >>>> Michael Haenlein >>>> Professor of Marketing >>>> >>> > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time-dependent covariates in survreg function
Dear all, I'm asking this question again as I didn't get a reply last time: I'm doing a survival analysis with time-dependent covariates. Until now, I have used a simple Cox model for this, specifically the coxph function from the survival library. Now, I would like to try out an accelerated failure time model with a parametric specification as implemented for example in the survreg function. Two questions: First, can survreg handle time-dependent covariates? The description for this function does not make reference to them. And second, in case survreg cannot deal with time-dependent covariates, is there a similar function in some other package that can? Thanks very much, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Collinearity in Moderated Multiple Regression
Dear all, I have one dependent variable y and two independent variables x1 and x2 which I would like to use to explain y. x1 and x2 are design factors in an experiment and are not correlated with each other. For example assume that: x1 <- rbind(1,1,1,2,2,2,3,3,3) x2 <- rbind(1,2,3,1,2,3,1,2,3) cor(x1,x2) The problem is that I do not only want to analyze the effect of x1 and x2 on y but also of their interaction x1*x2. Evidently this interaction term has a substantial correlation with both x1 and x2: x3 <- x1*x2 cor(x1,x3) cor(x2,x3) I therefore expect that a simple regression of y on x1, x2 and x1*x2 will lead to biased results due to multicollinearity. For example, even when y is completely random and unrelated to x1 and x2, I obtain a substantial R2 for a simple linear model which includes all three variables. This evidently does not make sense: y <- rnorm(9) model <- lm (y ~ x1 + x2 + x1*x2) summary(model) Is there some function within R or in some separate library that allows me to estimate such a regression without obtaining inconsistent results? Thanks for your help in advance, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Collinearity in Moderated Multiple Regression
Thanks for your comment! Actually, they are continuous variables which have a very low correlation -- I just wanted to make the whole story easier for explanation. My general question is: Does R offer an alternative to lm for situations where there is substantial collinearity between the independent variables? I have found the perturb package, but this seems to be focused on identifying collinearity not on dealing with it. Thanks, Michael On Tue, Aug 3, 2010 at 3:25 PM, Nikhil Kaza wrote: > Are x1 and x2 are factors (dummy variables)? cor does not make sense in > this case. > > Nikhil Kaza > Asst. Professor, > City and Regional Planning > University of North Carolina > > nikhil.l...@gmail.com > > > On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote: > > Dear all, >> >> I have one dependent variable y and two independent variables x1 and x2 >> which I would like to use to explain y. x1 and x2 are design factors in an >> experiment and are not correlated with each other. For example assume >> that: >> >> x1 <- rbind(1,1,1,2,2,2,3,3,3) >> x2 <- rbind(1,2,3,1,2,3,1,2,3) >> cor(x1,x2) >> >> The problem is that I do not only want to analyze the effect of x1 and x2 >> on >> y but also of their interaction x1*x2. Evidently this interaction term has >> a >> substantial correlation with both x1 and x2: >> >> x3 <- x1*x2 >> cor(x1,x3) >> cor(x2,x3) >> >> I therefore expect that a simple regression of y on x1, x2 and x1*x2 will >> lead to biased results due to multicollinearity. For example, even when y >> is >> completely random and unrelated to x1 and x2, I obtain a substantial R2 >> for >> a simple linear model which includes all three variables. This evidently >> does not make sense: >> >> y <- rnorm(9) >> model <- lm (y ~ x1 + x2 + x1*x2) >> summary(model) >> >> Is there some function within R or in some separate library that allows me >> to estimate such a regression without obtaining inconsistent results? >> >> Thanks for your help in advance, >> >> Michael >> >> >> Michael Haenlein >> Associate Professor of Marketing >> ESCP Europe >> Paris, France >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Collinearity in Moderated Multiple Regression
Thanks very much -- it seems that Ridge Regression can do what I'm looking for! Best, Michael -Original Message- From: Nikhil Kaza [mailto:nikhil.l...@gmail.com] Sent: Tuesday, August 03, 2010 16:21 To: haenl...@gmail.com Cc: r-help@r-project.org (r-help@R-project.org) Subject: Re: [R] Collinearity in Moderated Multiple Regression My usual strategy of dealing with multicollinearity is to drop the offending variable or transform one them. I would also check vif functions in car and Design. I think you are looking for lm.ridge in MASS package. Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.l...@gmail.com On Aug 3, 2010, at 9:51 AM, haenl...@gmail.com wrote: > I'm sorry -- I think I chose a bad example. Let me start over again: > > I want to estimate a moderated regression model of the following form: > y = a*x1 + b*x2 + c*x1*x2 + e > > Based on my understanding, including an interaction term (x1*x2) into > the regression in addition to x1 and x2 leads to issues of > multicollinearity, as x1*x2 is likely to covary to some degree with x1 > (and x2). One recommendation I have seen in this context is to use > mean centering, but apparently this does not solve the problem (see: > Echambadi, Raj and James D. Hess (2007), "Mean-centering does not > alleviate collinearity problems in moderated multiple regression > models," Marketing science, 26 (3), > 438 - > 45). So my question is: Which R function can I use to estimate this > type of model. > > Sorry for the confusion caused due to my previous message, > > Michael > > > > > > > On Aug 3, 2010 3:42pm, David Winsemius wrote: >> I think you are attributing to "collinearity" a problem that is due >> to your small sample size. You are predicting 9 points with 3 >> predictor terms, and incorrectly concluding that there is some >> "inconsistency" >> because you get an R^2 that is above some number you deem surprising. >> (I got values between 0.2 and 0.4 on several runs. > > > >> Try: > >> x1 >> x2 >> x3 > > >> y >> model >> summary(model) > > > >> # Multiple R-squared: 0.04269 > > > >> -- > >> David. > > > >> On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote: > > > > >> Dear all, > > > >> I have one dependent variable y and two independent variables x1 and >> x2 > >> which I would like to use to explain y. x1 and x2 are design factors >> in an > >> experiment and are not correlated with each other. For example assume >> that: > > > >> x1 >> x2 >> cor(x1,x2) > > > >> The problem is that I do not only want to analyze the effect of x1 >> and x2 on > >> y but also of their interaction x1*x2. Evidently this interaction >> term has a > >> substantial correlation with both x1 and x2: > > > >> x3 >> cor(x1,x3) > >> cor(x2,x3) > > > >> I therefore expect that a simple regression of y on x1, x2 and >> x1*x2 will > >> lead to biased results due to multicollinearity. For example, even >> when y is > >> completely random and unrelated to x1 and x2, I obtain a substantial >> R2 for > >> a simple linear model which includes all three variables. This >> evidently > >> does not make sense: > > > >> y >> model >> summary(model) > > > >> Is there some function within R or in some separate library that >> allows me > >> to estimate such a regression without obtaining inconsistent results? > > > >> Thanks for your help in advance, > > > >> Michael > > > > > >> Michael Haenlein > >> Associate Professor of Marketing > >> ESCP Europe > >> Paris, France > > > >> [[alternative HTML version deleted]] > > > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > >> David Winsemius, MD > >> West Hartford, CT > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Collinearity in Moderated Multiple Regression
h the other two covariates is > 0.68, which is nowhere close to the 0.9 or above correlations that > signal potential multicollinearity. > > HTH, > Dennis > > > One > > recommendation I have seen in this context is to use mean centering, > > but apparently this does not solve the problem (see: Echambadi, Raj > > and James D. Hess (2007), "Mean-centering does not alleviate > > collinearity problems in moderated multiple regression models," > > Marketing science, 26 (3), 438 - 45). So my question is: Which R > > function can I use to estimate this type of model. > > > > > Sorry for the confusion caused due to my previous message, > > > > Michael > > > > > > > > > > > > > > On Aug 3, 2010 3:42pm, David Winsemius wrote: > > > I think you are attributing to "collinearity" a problem that is > > > due to your small sample size. You are predicting 9 points with 3 > > > predictor terms, and incorrectly concluding that there is some "inconsistency" > > > because you get an R^2 that is above some number you deem > > > surprising. (I got values between 0.2 and 0.4 on several runs. > > > > > > > > > Try: > > > > > x1 > > > x2 > > > x3 > > > > > > > y > > > model > > > summary(model) > > > > > > > > > # Multiple R-squared: 0.04269 > > > > > > > > > -- > > > > > David. > > > > > > > > > On Aug 3, 2010, at 9:10 AM, Michael Haenlein wrote: > > > > > > > > > > > Dear all, > > > > > > > > > I have one dependent variable y and two independent variables x1 > > > and x2 > > > > > which I would like to use to explain y. x1 and x2 are design > > > factors in > > an > > > > > experiment and are not correlated with each other. For example > > > assume > > > that: > > > > > > > > > x1 > > > x2 > > > cor(x1,x2) > > > > > > > > > The problem is that I do not only want to analyze the effect of x1 > > > and x2 on > > > > > y but also of their interaction x1*x2. Evidently this interaction > > > term has a > > > > > substantial correlation with both x1 and x2: > > > > > > > > > x3 > > > cor(x1,x3) > > > > > cor(x2,x3) > > > > > > > > > I therefore expect that a simple regression of y on x1, x2 and > > > x1*x2 will > > > > > lead to biased results due to multicollinearity. For example, even > > > when y is > > > > > completely random and unrelated to x1 and x2, I obtain a > > > substantial R2 for > > > > > a simple linear model which includes all three variables. This > > > evidently > > > > > does not make sense: > > > > > > > > > y > > > model > > > summary(model) > > > > > > > > > Is there some function within R or in some separate library that > > > allows > > me > > > > > to estimate such a regression without obtaining inconsistent results? > > > > > > > > > Thanks for your help in advance, > > > > > > > > > Michael > > > > > > > > > > > > > Michael Haenlein > > > > > Associate Professor of Marketing > > > > > ESCP Europe > > > > > Paris, France > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > > __ > > > > > R-help@r-project.org mailing list > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > David Winsemius, MD > > > > > West Hartford, CT > > > > > > > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Aggregating data from two data frames
Dear all, I'm working with two data frames. The first frame (agg_data) consists of two columns. agg_data[,1] is a unique ID for each row and agg_data[,2] contains a continuous variable. The second data frame (geo_data) consists of several columns. One of these columns (geo_data$ZCTA) corresponds to the unique ID in the first data frame. The problem is that only a subset of the unique ID present in the first data frame also appears in the second data fame. What I would like to do is to add another column to the second data frame (geo_data) that includes the value of the continuous variable from the first frame that corresponds to the unique ID. To put it differently, I want R to look at each row in the second data frame, look for the unique ID (geo_data$ZCTA), look for the same unique ID in the first data frame and then paste the value from the continous variable as a new column into the second data frame. I hope I'm somewhat clear here ... Is there a convenient way of doing this? Thanks very much in advance, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Total effect of X on Y under presence of interaction effects
Dear all, this is probably more a statistics question than an R question but probably there is somebody who can help me nevertheless. I'm running a regression with four predictors (a, b, c, d) and all their interaction effects using lm. Based on theory I assume that a influences y positively. In my output (see below) I see, however, a negative regression coefficient for a. But several of the interaction effects of a with b, c and d have positive signs. I don't really understand this. Do I have to add up the coefficient for the main effect and the ones of all interaction effects to get a total effect of a on y? Or am I doing something wrong here? Thanks very much for your answer in advance, Regards, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France Call: lm(formula = y ~ a * b * c * d) Residuals: Min 1Q Median 3Q Max -44.919 -5.184 0.294 5.232 115.984 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 27.3067 0.8181 33.379 < 2e-16 *** a -11.0524 2.0602 -5.365 8.25e-08 *** b-2.5950 0.4287 -6.053 1.47e-09 *** c -22.0025 2.8833 -7.631 2.50e-14 *** d20.5037 0.3189 64.292 < 2e-16 *** a:b 15.1411 1.1862 12.764 < 2e-16 *** a:c 26.8415 7.2484 3.703 0.000214 *** b:c 8.3127 1.5080 5.512 3.61e-08 *** a:d 6.6221 0.8061 8.215 2.33e-16 *** b:d -2.0449 0.1629 -12.550 < 2e-16 *** c:d 10.0454 1.1506 8.731 < 2e-16 *** a:b:c 1.4137 4.1579 0.340 0.733862 a:b:d-6.1547 0.4572 -13.463 < 2e-16 *** a:c:d -20.6848 2.8832 -7.174 7.69e-13 *** b:c:d-3.4864 0.6041 -5.772 8.05e-09 *** a:b:c:d 5.6184 1.6539 3.397 0.000683 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 7.913 on 12272 degrees of freedom Multiple R-squared: 0.8845, Adjusted R-squared: 0.8844 F-statistic: 6267 on 15 and 12272 DF, p-value: < 2.2e-16 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Powerful PC to run R
Dear all, I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my calculations run for several days sometimes even weeks (mainly simulations over a large parameter space). Depending on the external conditions, my laptop sometimes shuts down due to overheating. I'm now thinking about buying a more powerful desktop PC or laptop. Can anybody advise me on the best configuration to run R as fast as possible? I will use this PC exclusively for R so any other factors are of limited importance. Thanks, Michael Michael Haenlein Assocaite Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help to improve existing R-Code
Dear all, I have written a relatively brief R-Code to run a series of simulations. Currently the code runs for a very long time (up to several days, depending on the conditions) and I expect this to be the case because it might not be very efficiently written. I am, for example, relying on several for(...) loops which could probably be done much faster using a different way of programming. I am looking for a consultant who could help me to improve my code. The idea is that I send the code to the person, s/he works on improving it and then sends the improved version back to me. I think for an experienced programmer the job should not take more than 2-3 days (probably less), but this is to be decided once the person has looked at the code. In case you are interested, please send me a brief message so that I can provide you with more details, Thanks, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help to improve existing R-Code
I'm looking to hire someone -- sorry for not having been more precise! Michael On Fri, May 27, 2011 at 1:23 PM, Duncan Murdoch wrote: > On 11-05-27 3:23 AM, Michael Haenlein wrote: > >> Dear all, >> >> I have written a relatively brief R-Code to run a series of simulations. >> Currently the code runs for a very long time (up to several days, >> depending >> on the conditions) and I expect this to be the case because it might not >> be >> very efficiently written. I am, for example, relying on several for(...) >> loops which could probably be done much faster using a different way of >> programming. >> >> I am looking for a consultant who could help me to improve my code. The >> idea >> is that I send the code to the person, s/he works on improving it and then >> sends the improved version back to me. I think for an experienced >> programmer >> the job should not take more than 2-3 days (probably less), but this is to >> be decided once the person has looked at the code. >> > > Your message is ambiguous: are you asking for someone to volunteer 2-3 > days to help you, or are you trying to hire someone? > > Duncan Murdoch > > In case you are interested, please send me a brief message so that I can >> provide you with more details, >> >> Thanks, >> >> Michael >> >> >> >> Michael Haenlein >> Professor of Marketing >> ESCP Europe >> Paris, France >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] System of related regression equations
Dear all, I would like to estimate a system of regression equations of the following form: y1 = a1 + b1 x1 + b2x2 + e1 y2 = a2 + c1 y1 + c2 x2 + c3 x3 + e2 Specifically the dependent variable in Equation 1 appears as an independent variable in Equation 2. Additionally some independent variables that appear in Equation 1 are also included in Equation 2. I assume that I cannot estimate these two regressions separately using lm. Is there an efficient way to estimate these equations? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Effect size in multiple regression
Dear all, is there a convenient way to determine the effect size for a regression coefficient in a multiple regression model? I have a model of the form lm(y ~ A*B*C*D) and would like to determine Cohen's f2 (http://en.wikipedia.org/wiki/Effect_size) for each predictor without having to do it manually. Thanks, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Testing equality of coefficients in coxph model
Dear all, I'm running a coxph model of the form: coxph(Surv(Start, End, Death.ID) ~ x1 + x2 + a1 + a2 + a3) Within this model, I would like to compare the influence of x1 and x2 on the hazard rate. Specifically I am interested in testing whether the estimated coefficient for x1 is equal (or not) to the estimated coefficient for x2. I was thinking of using a Chow-test for this but the Chow test appears to work for linear regression only (see: http://en.wikipedia.org/wiki/Chow_test). Another option I was thinking of is to estimate an alternative model in which the coefficients for x1 and x2 are constraint to be equal and to compare the fit of such a constraint model with the one of an unconstraint one. But again I'm not sure how this can be done using coxph. Could anyone help me out on this please? Thanks, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Allocation of data points to groups based on membership probabilities
Dear all, I have a matrix that provides, for a series of data points, the probability that each of these points belongs to a certain group. Take the following example, which represents 20 data points and their group membership probability to five groups (A-E): set.seed(1) probs <- matrix(runif(100),nrow=20, dimnames=list(c(),c("A","B","C","D","E"))) In addition know how large each group should be. Assume for example, that the groups sizes in the aforementioned example are 5, 4, 1, 6, 4 for A, B, C, D and E respectively. I would like to allocate individuals to the groups so that (a) each group has the size it is supposed to have and (b) all data points are part of the group where they have a high probability of belonging. For some data points this allocation is straightforward, because one group membership probability is much larger than the others. But for others two or more probabilities are very similar which means that a datapoint could be allocated to either one or the other group. I guess it should be possible to write some iterative code or an optimization routine that can do what I would like to do, but I do not know how. Does anyone have an idea how this could be done? Thanks very much in advance, Michael Haenlein Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Binary optimization problem in R
Dear all, I would like to solve a problem similar to a multiple knapsack problem and am looking for a function in R that can help me. Specifically, my situation is as follows: I have a list of n items which I would like to allocate to m groups with fixed size. Each item has a certain profit value and this profit depends on the type of group the item is in. My problem is to allocate the items into groups so the overall profit is maximized while respecting the fixed size of each group. Take the following example with 20 items (n=5) and 5 groups (m=5): set.seed(1) profits <- matrix(runif(100), nrow=20) size<-c(2,3,4,5,6) The matrix "profits" describes the profit of each item when it is allocated to a certain group. For example, when item 1 is allocated to group 1 it generates a profit of 0.26550866. However, when item 1 is allocated to group 2 it generates a profit of 0.93470523. The matrix "size" describes the size of each group. So group 1 can contain 2 items, group 2 3 items, group 4 4 items, etc. I think this is probably something that could be done with constrOptim() but I'm not exactly sure how. Any help is very much appreciated! Thanks very much in advance, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cannot allocate vector of size x
Dear all, I am running a simulation in which I randomly generate a series of vectors to test whether they fulfill a certain condition. In most cases, there is no problem. But from time to time, the (randomly) generated vectors are too large for my system and I get the error message: "Cannot allocate vector of size x". The problem is that in those cases my simulation stops and I have to start it again manually. What I would like to do is to simply ignore that the error happened (or probably report that it did) and then continue with another (randomly) generated vector. So my question: Is there a way to avoid that R stops in such a case and just restarts the program from the beginning as if nothing happened? I hope I'm making myself clear here ... Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with estimating copulas
Dear all, I am looking to hire a consultant/ adviser who can help me to get my head around copulas. For a person familiar with the topic ( http://en.wikipedia.org/wiki/Copula_(statistics)) who knows the copula package or similar (http://www.jstatsoft.org/v21/i04/paper) I think the job should not take more than a couple of hours, one day maximum. Please contact me in case you are interested so that I can provide you with additional details on the problem I'd like to solve. Thanks, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Convert continuous variable into discrete variable
Dear all, I have a continuous variable that can take on values between 0 and 100, for example: x<-runif(100,0,100) I also have a second variable that defines a series of thresholds, for example: y<-c(3, 4.5, 6, 8) I would like to convert my continuous variable into a discrete one using the threshold variables: If x is between 0 and 3 the discrete variable should be 1 If x is between 3 and 4.5 the discrete variable should be 2 If x is between 4.5 and 6 the discrete variable should be 3 If x is between 6 and 8 the discrete variable should be 4 If x is larger than 8 the discrete variable should be 5 Is there a straightforward way of doing this (besides working with several if statements in a row)? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression model when dependent variable can only take positive values
Dear all, I would like to run a regression of the form lm(y ~ x1+x2) where the dependent variable y can only take positive values. Assume, for example, that y is the height of a person (measured in cm), x1 is the gender (measured as a binary indicator with 0=male and 1=female) and x2 is the age of the person (measured in years). When I run a simple lm(y ~ x1+x2), I obtain an intercept value that is negative. I interpret that in a way that a person who is male (x1=0) and just born (x2=0), has a negative height. This evidently does not make sense. I therefore assume that my estimates might be biased and that I need to use some other form of estimation that takes account of the fact that y>0 for all observations. Could anybody please tell me which type of regression would be most recommendable for this type of analysis? Thanks very much in advance, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract BIC for coxph
Dear all, is there a function similar to extractAIC based on which I can extract the BIC (Bayesian Information Criterion) of a coxph model? I found some functions that provide BIC in other packages, but none of them seems to work with coxph. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pearson correlation of sum of variables
Dear all, this is more a math-related question, but probably can help me nevertheless: Assume I have two random variables: A and B. Furthermore assume that I know the Pearson Correlation Coefficient between A and B: cor(A,B) I now define C = 1-(A+B). Is there some way to determine cor(C,A) and cor(C,B)? Or, to put it differently, what is cor(1-A-B)? I know that the Pearson Correlation Coefficient is not additive, but probably there is still some way to solve that. Thanks very much in advance, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pearson chi-square test
Dear all, I have some trouble understanding the chisq.test function. Take the following example: set.seed(1) A <- cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) B <- cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) C <- cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) x <- table(A,B) y <- table(A,C) When I calculate the test statistic by hand I get a value of approximately 75.9: http://en.wikipedia.org/wiki/Pearson's_chi-square_test#Calculating_the_test-statistic sum((x-y)^2/y) But when I do chisq.test(x,y) I get a value of 12.2 while chisq.test(y,x) gives a value of 10.3. I understand that I must be doing something wrong here, but I'm not sure what. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pearson chi-square test
Dear Michael, Thanks very much for your answers! The purpose of my analysis is to test whether the contingency table x is different from the contingency table y. Or, to put it differently, whether there is a significant difference between the joint distribution A&B and A&C. Based on your answer I'm wondering whether the best way to do this is really a chisq.test? Or is there probably a different function or package I should use altogether? Thanks, Michael -Original Message- From: Meyners, Michael [mailto:meyner...@pg.com] Sent: Dienstag, 27. September 2011 17:00 To: Michael Haenlein; r-help@r-project.org Subject: RE: [R] Pearson chi-square test Just for completeness: the manual calculation you'd want is most likely sum((x-y)^2 / (x+y)) (that's one you can find on the Wikipedia link you provided). To get the same from chisq.test, try something like chisq.test(data.frame(x,y)[,c(3,6)]) (there are surely smarter ways, but at least it works here). Note that something like chisq.test(as.vector(x), as.vector(y)) will give a different test, i.e. based on a contingency table of x cross y). M. > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Meyners, Michael > Sent: Tuesday, September 27, 2011 13:28 > To: Michael Haenlein; r-help@r-project.org > Subject: Re: [R] Pearson chi-square test > > Not sure what you want to test here with two matrices, but reading the > manual helps here as well: > > y a vector; ignored if x is a matrix. > > x and y are matrices in your example, so it comes as no surprise that > you get different results. On top of that, your manual calculation is > not correct if you want to test whether two samples come from the same > distribution (so don't be surprised if R still gives a different > value...). > > HTH, Michael > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > > project.org] On Behalf Of Michael Haenlein > > Sent: Tuesday, September 27, 2011 12:45 > > To: r-help@r-project.org > > Subject: [R] Pearson chi-square test > > > > Dear all, > > > > I have some trouble understanding the chisq.test function. > > Take the following example: > > > > set.seed(1) > > A <- cut(runif(100),c(0.0, 0.35, 0.50, 0.65, 1.00), labels=FALSE) > > B <- cut(runif(100),c(0.0, 0.25, 0.40, 0.75, 1.00), labels=FALSE) > > C <- cut(runif(100),c(0.0, 0.25, 0.50, 0.80, 1.00), labels=FALSE) > > x <- table(A,B) > > y <- table(A,C) > > > > When I calculate the test statistic by hand I get a value of > > approximately > > 75.9: > > http://en.wikipedia.org/wiki/Pearson's_chi- > > square_test#Calculating_the_test-statistic > > sum((x-y)^2/y) > > > > But when I do chisq.test(x,y) I get a value of 12.2 while > > chisq.test(y,x) > > gives a value of 10.3. > > > > I understand that I must be doing something wrong here, but I'm not > > sure > > what. > > > > Thanks, > > > > Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time-dependent covariates in coxph model
Dear all, I have a question about time-dependent covariates in a coxph model. Specifically I am wondering whether it is possible to give more recent events a higher weight when constructing time-dependent covariates. Assume I have a sample of cancer patients and I would like to predict whether the number of treatments a patient received has an impact on survival time. For each patient in my sample I know (a) the date when a patient is diagnosed with cancer, (b) all the dates where a treatment took place and (c) the date of death or, alternatively, the date where the observation window ends. Take the following example: Bob is diagnosed with cancer on 01/01/1990, has three treatments (on 01/01/1993, 01/01/1995 and 01/01/1997) and dies on 01/01/1999. In order to incorporate the time-dependent covariates into my model, I transform this into four separate datapoints: (1) Start: 01.01.1990, End: 01.01.1993, Number of treatments: 0 (2) Start: 01.01.1993, End: 01.01.1995, Number of treatments: 1 (3) Start: 01.01.1995, End: 01.01.1997, Number of treatments: 2 (4) Start: 01.01.1997, End: 01.01.1999, Number of treatments: 3 The problem is that in this formulation all treatments count the same way, no matter when they took place. I would like to introduce some form of discount factor that takes account of the fact that the potential impact of each treatment decays over time. If that discount factor is d, I would like to model the following four datapoints: (1) Start: 01.01.1990, End: 31.12.1992, Number of treatments: 0 (2) Start: 01.01.1993, End: 31.12.1994, Number of treatments: 1 (3) Start: 01.01.1995, End: 31.12.1996, Number of treatments: 1*d^2 + 1 (4) Start: 01.01.1997, End: 01.01.1999, Number of treatments: 1*d^4 + 1*d^2 + 1 d^n hereby accounts for the fact that the treatment was already n years ago at the start of the observation. My question: Is it possible to include such a formulation in a coxph model? Is there a way to estimate the optimal d, so that I can estimate how fast the effect of a treatment decays over time, given the data I have? Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Curve fitting, probably splines
Dear all, This is probably more related to statistics than to [R] but I hope someone can give me an idea how to solve it nevertheless: Assume I have a variable y that is a function of x: y=f(x). I know the average value of y for different intervals of x. For example, I know that in the interval[0;x1] the average y is y1, in the interval [x1;x2] the average y is y2 and so forth. I would like to find a line of minimum curvature so that the average values of y in each interval correspond to y1, y2, ... My idea was to use (cubic) splines. But the problem I have seems somewhat different to what is usually done with splines. As far as I understand it, splines help to find a curve that passes a set of given points. But I don't have any points, I only have average values of y per interval. If you have any suggestions on how to solve this, I'd love to hear them. Thanks very much in advance, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Consultant to program R-code dealing with social networks
Dear all, I am looking for a consultant/ programmer to program a relatively simple R code for me. Specifically, I have about 50 social networks. These networks have between 5,000 and 5 million nodes and between 30,000 and 70 million edges. The code should (a) read one network into R, (b) draw a snowball sample of size x out of the network (e.g., a snowball sample of 1,000 nodes), (c) determine some basic network statistics for that sample and (d) save the sample and network statistics into two files for further use. Let me know by email on case you are interested so that we can speak about the remaining details. Thanks, Michael Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression with very high number of categorical variables
Dear all, I would like to run a simple regression model y~x1+x2+x3+... The problem is that I have a lot of independent variables (xi) -- around one hundred -- and that some of them are categorical with a lot of categories (like, for example, ZIP code). One straightforward way would be to (a) transform all categorical variables into 1/0 dummies and (b) enter all the variables into an lm model. But I'm not sure whether this is very efficient, especially since the analysis is exploratory in nature and I expect that many of the xi will have no significant impact on y. Is there a R library that can handle such a setting? I have read about "Hierarchical Bayesian variance components models" that have been used with ZIP data (www.jstor.org/stable/10.2307/4129723), but I'm not sure to which extent there is a function in R to do that in a straightforward manner. Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.