[R] Java Exception error while reading large data in R from DB using RJDBC.
Dear List, Java Exception error while reading large data in R from DB using RJDBC. I am trying to read large data from DB table(Vectorwise), using RJDBC connection. I have tested the connection with small size data and was able to fetch DB tables using same connection(conn as in my code). Please suggest where am i going wrong or alternate option to solve such issues while reading large DB table. drv <- JDBC(paste(db_driver, sep = ""), paste(db_jar_file, sep = ""), identifier.quote="`") conn <<- dbConnect(drv, paste(db_server, sep = ""), paste(db_server_lgn, sep = ""), paste(db_server_pwd, sep = "")) s <- sprintf("select * from cypress_modeldev_account_info") temp <- dbGetQuery(conn, s) Error in .jcheck() : Java Exception .jcall(rp, "I", "fetch", stride) -- View this message in context: http://r.789695.n4.nabble.com/Java-Exception-error-while-reading-large-data-in-R-from-DB-using-RJDBC-tp4647844.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Solving binary integer optimization problem
Hi, I am new to R for solving optimization problems, I have set of communication channels with limited capacity with two types of costs, fixed and variable cost. Each channel has expected gain for a single communication. I want to determine optimal number of communications for each channel maximizing ROI)return on investment) with overall budget as constraint.6 is the budget allocated. Channel Fixed_Cost Variable_Cost CapacityExpected_Gain C1 400 2.5 50000.25 C2 1 0 3 0.3 C3 40000.152 0.15 C4 20002 1 0.36 C5 100 3 40000.09 Channel_Select <-data.frame(Channel=c('c1','c2','c3','c4','c5'), Fixed_Cost=c(400,5000,4000,2000,100), Variable_Cost=c(2.5,0,0.15,2,3), Capacity=c(5000,3,2,1,4000), Expected_gain=c(0.25,0.3,0.15,0.36,0.09)) Let x1,x2,x3,x4,x5 are the decision variables for c1,c2,c3,c4,c5 channel and z1,z2,z3,z4,z5 are the indicator binary variables if channel has allocated communication if any. max((0.25*x1+0.30*x2+0.15*x3+0.36*x4+0.09*x5)-(2.5*x1+0*x2+0.15*x3+2*x4+3*x5+400*z1+1*z2+4000*z3+2000*z4+100*z5)/( 2.5*x1+0*x2+0.15*x3+2*x4+3*x5+400*z1+1*z2+4000*z3+2000*z4+100*z5)) Constraints: (2.5*x1+0*x2+0.15*x3+2*x4+3*x5+400*z1+1*z2+4000*z3+2000*z4+100*z5)) <= 6 ##Budget Constraint x1-5000*z1<=0 x2-3*z2<=0 x3-2*z3<=0 x4-1*z4<=0 x5-4000*z5<=0 x1 >= 200 x2 >= 100 x3>=100 x4>=500 x5>=0 I had tried lp function from lpsolve but not able to set objective.in for objective function. Any help or hint is welcomed! -- View this message in context: http://r.789695.n4.nabble.com/Solving-binary-integer-optimization-problem-tp4639891.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Putting directory path as a parameter
Hi List, I am new to R, this may be simple. I want to store directory path as parameter which in turn to be used while reading and writing data from csv files. How I can use dir defined in the below mentioned example while reading the csv file. Example: dir <- "C:/Users/Desktop" #location of file temp_data <- read.csv("dir/bs_dev_segment_file.csv") If I run this it will show errors: Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'dir/bs_dev_segment_file.csv': No such file or directory Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Putting-directory-path-as-a-parameter-tp4043092p4043092.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Similar function for Redun() from Hmisc ?
Hi List, Working on the large data frame (number of records=35000 and number of variables=160). Using redun() for dropping variables before using into model. V <- redun(~., data = data.frame, r2 = 0.8) It takes enormously high time for execution, is there anything wrong in the script? Suggest any other similar function available for dropping redundant variables. Thanks in advance! ~A -- View this message in context: http://r.789695.n4.nabble.com/Similar-function-for-Redun-from-Hmisc-tp4095455p4095455.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Any function\method to use automatically Final Model after bootstrapping using boot.stepAIC()
Hi List, Being new to R, I am trying to apply boot.stepAIC() for Model selection by bootstrapping the stepAIC() procedure. I had gone through the discussion in various thread on the variable selection methods. Understood the pros and cons of various method, also going through the regression modelling strategies in rms. I want to read Final model or Formula or list of variables automatically after boot.stepAIC(). n <- 200 x1 <- runif(n, -3, 3) x2 <- runif(n, -3, 3) x3 <- runif(n, -3, 3) x4 <- runif(n, -3, 3) x5 <- factor(sample(letters[1:2], n, rep = TRUE)) eta <- 0.1 + 1.6 * x1 - 2.5 * as.numeric(as.character(x5) == levels(x5)[1]) y1 <- rbinom(n, 1, plogis(eta)) data <- data.frame(y1,x1, x2, x3, x4, x5) glmFit1 <- glm(y1 ~ x1 + x2 + x3 + x4 + x5, family = binomial, data = data) bglmfit <- boot.stepAIC(glmFit1, data, B = 50) bglmfit In the summary of Bootstrapping the 'stepAIC()' procedure, Following information is listed: Initial Model: y1 ~ x1 + x2 + x3 + x4 + x5 Final Model: y1 ~ x1 + x5 Is there any function or method for using Final Model by Bootstrapping the 'stepAIC()' procedure, like OrigstepAIC model as shown below. n <- 200 x1 <- runif(n, -3, 3) x5 <- factor(sample(letters[1:2], n, rep = TRUE)) eta <- 0.1 + 1.6 * x1 - 2.5 * as.numeric(as.character(x5) == levels(x5)[1]) data1 <- data.frame(x1, x5) data1$probscore <- predict(bglmfit$OrigStepAIC , data1) Is there any way to read the variables or formula in the Final Model. Thanks in advance! Regards, ~Ajit -- View this message in context: http://r.789695.n4.nabble.com/Any-function-method-to-use-automatically-Final-Model-after-bootstrapping-using-boot-stepAIC-tp4119050p4119050.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating the probability of an event at time "t" from a Cox model fit
Dear R-users, I would like to determine the probability of event at specific time using cox model fit. On the development sample data I am able to get the probability of a event at time point(t). I need probability score of a event at specific time, using scoring scoring dataset which will have only covariates and not the response variables. Here is the sample code: n = 1000 beta1 = 2; beta2 = -1; lambdaT = .02 # baseline hazard lambdaC = .4 # hazard of censoring x1 = rnorm(n,0) x2 = rnorm(n,0) # true event time T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) C = rweibull(n, shape=1, scale=lambdaC) #censoring time time = pmin(T,C) #observed time is min of censored and true event = time==T # set to 1 if event is observed dataphr=data.frame(time,event,x1,x2) library(survival) fit_coxph <- coxph(Surv(time, event)~ x1 + x2 , method="breslow") library(peperr) predictProb.coxph(fit_coxph, Surv(dataphr$time, dataphr$event), dataphr, 0.003) # Using predictProb.coxph function, probability of event at time (t) is estimated for cox fit models, I want to estimate this probability on scoring dataset score_data as below with covariate x1 and x2. Is it possible/ is there any way to get these probabilities? since in predictProb.coxph function it requires response, which is not preseent on scoring sample. n = 1 set.seed(1) x1 = rnorm(n,0) x2 = rnorm(n,0) score_data <- data.frame(x1,x2) Thanks in advance!! ~ Ajit -- View this message in context: http://r.789695.n4.nabble.com/Calculating-the-probability-of-an-event-at-time-t-from-a-Cox-model-fit-tp4213318p4213318.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating and assigning variable names in loop
Hello List I am trying to create and assign variable names in loop, but not able to get expected variable names. Here is the sample code n = 10 set.seed(1) x1 = rnorm(n,0) x2 = rnorm(n,0) samp_data <- data.frame(x1,x2) for( i in 1:3) { label <- paste("score", i, sep="_") assign(label, value =x1+(x2*i) ) samp_data <- cbind(samp_data, get(label)) } > head(samp_data) x1x2 get(label) get(label) get(label) 1 -0.6264538 1.51178117 0.8853274 2.3971085 3.9088897 2 0.1836433 0.38984324 0.5734866 0.9633298 1.3531730 3 -0.8356286 -0.62124058 -1.4568692 -2.0781098 -2.6993504 4 1.5952808 -2.21469989 -0.6194191 -2.8341190 -5.0488189 5 0.3295078 1.12493092 1.4544387 2.5793696 3.7043005 6 -0.8204684 -0.04493361 -0.8654020 -0.9103356 -0.9552692 I am expecting new variables to be created in the samp_data are score_1 score_2 score_3, instead get(label) get(label) get(label) Where am I going wrong? Thanks in advance ~Ajit -- View this message in context: http://r.789695.n4.nabble.com/Creating-and-assigning-variable-names-in-loop-tp4221080p4221080.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting data by eliminating redundant variables
Dear All, I am new to R, I have one question which might be easy. I have a large data with more than 250 variable, i am reducing number of variables by redun function as in the example below, n <- 100 x1 <- runif(n) x2 <- runif(n) x3 <- x1 + x2 + runif(n)/10 x4 <- x1 + x2 + x3 + runif(n)/10 x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) x6 <- 1*(x5=='a' | x5=='c') data1 <- cbind(x1,x2,x3,x4,x5,x6) data2 <- data.frame(data1) outredun <- redun(~., data=data2, r2=.8,) outredun #outredun1 <- capture.output(redun(~., data=data2, r2=.8,)) #outredun1 #x25 <- outredun1[25] #mydata12 <- daat1[myvars] #myvars I need to pass to retain variables which gives me , say for this example Rendundant variables:x6 x4 x3 and Predicted from variables: x1 x2 x5 as output in console. I want to subset my original data with either by keeping 'Predicted from variables' or by droping 'Rendundant variables'. I have tried using capture.output function as mentioned above in the commented code but it gives me a string like "x1 x2 x5 " which need to modify as "x1", "x2", "x3" as input to subset data. As my data has more than 250 variables and evry time data and nuber of variables are changing. How this can be achived? Thanks in advance for the help. Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-data-by-eliminating-redundant-variables-tp3918199p3918199.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to remove multiple outliers
Hi All, I am working on the dataset in which some of the variables have more than one observations with outliers . I am using below mentioned sample script library(outliers) x1 <- c(10, 10, 11, 12, 13, 14, 14, 10, 11, 13, 12, 13, 10, 19, 18, 17, 10099, 10099, 10098) outlier_tf1 = outlier(x1,logical=TRUE) find_outlier1 = which(outlier_tf1==TRUE, arr.ind=TRUE) beh_input_ro1 = x1[-find_outlier1] It removes the outliers which are extrme and not all. In this example it removes only 10099, 10099 and not 10098. Thanks for the help in advance. -Ajit -- View this message in context: http://r.789695.n4.nabble.com/How-to-remove-multiple-outliers-tp3921689p3921689.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove multiple outliers
Hi Michael, Thanks for the help. Yes, I have gone through the document for ?outlier. As it removes one outlier at a time, being new to R, I was woondering is there any function available for removing multiple outliers whithout calling say rm.outlier for n number of time because n is not finite here. On the second point, I am using below mentioned piece of code, because I am getting error when rm.outlier with fill = FALSE option is applied on the same dataset. outlier_tf1 = outlier(x1,logical=TRUE) find_outlier1 = which(outlier_tf1==TRUE, arr.ind=TRUE) beh_input_ro1 = x1[-find_outlier1] > library(outliers) > beh_input_ro <- rm.outlier(beh_input_dr, fill = FALSE, median = FALSE, > opposite = FALSE) Error in data.frame(X1 = c(28.7812, 24.8923, 31.3987, 25.774, 27.1798, : arguments imply differing number of rows: 2398, 2390, 2399 Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/How-to-remove-multiple-outliers-tp3921689p3924904.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data frame manipulation by eliminating rows containing extreme values
Dear All, I have got the limits for removing extreme values for each variables using following function . f=function(x){quantile(x, c(0.25, 0.75),na.rm = TRUE) - matrix(IQR(x,na.rm = TRUE) * c(1.5), nrow = 1) %*% c(-1, 1)} #Example: n <- 100 x1 <- runif(n) x2 <- runif(n) x3 <- x1 + x2 + runif(n)/10 x4 <- x1 + x2 + x3 + runif(n)/10 x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) x6 <- 1*(x5=='a' | x5=='c') data1 <- cbind(x1,x2,x3,x4,x5,x6) data2 <- data.frame(data1) xyz <- lapply(data1, f) #Now, I can eliminate those rows(observations) from the data which contains extreme values for each of the variables one by one as below. data2 <- subset (data2, x1<=xyz$x1[,1] & x1>=xyz$x1[,2]) data2 <- subset (data2, x1<=xyz$x2[,1] & x1>=xyz$x2[,2]) . . and so on.. But my data has more number of variables (more than 120), can any body suggest efficient way of eliminating rows containg extreme values? Thanks in advance! Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Data-frame-manipulation-by-eliminating-rows-containing-extreme-values-tp3927941p3927941.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame manipulation by eliminating rows containing extreme values
Hi David, Thanks for the reply, f=function(x){quantile(x, c(0.25, 0.75),na.rm = TRUE) - matrix(IQR(x,na.rm = TRUE) * c(1.5), nrow = 1) %*% c(-1, 1)} Here parameter 1.5 is set for example in the above function as argument, it can be even more may be 3.0 after analyzing actual data. Here expectation is to find cut-off on both sides(higher and lower values) for each variable as like in box plot. And then I would like to eliminate observations based on the cut-off. For the second point, I am extremly sorry. It was because of the typo mistake, actually in xyz <- lapply(data1, f) here it is data2 n <- 100 x1 <- runif(n) x2 <- runif(n) x3 <- x1 + x2 + runif(n)/10 x4 <- x1 + x2 + x3 + runif(n)/10 x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) x6 <- 1*(x5=='a' | x5=='c') data1 <- cbind(x1,x2,x3,x4,x5,x6) data2 <- data.frame(data1) xyz <- lapply(data2, f) str (xyz) Now it has list of six only List of 6 $ x1: num [1, 1:2] 0.7797 0.0613 $ x2: num [1, 1:2] 0.9533 0.0194 $ x3: num [1, 1:2] 1.438 0.532 $ x4: num [1, 1:2] 2.85 1.03 $ x5: num [1, 1:2] 4 0 $ x6: num [1, 1:2] 1.5 -0.5 Third point you mentioned is the problem to resolved, now I am overwriting data2 applying these cut-offs for each variable. Is there any efficient way to do this? data2 <- subset (data2, x1<=xyz$x1[,1] & x1>=xyz$x1[,2]) data2 <- subset (data2, x1<=xyz$x2[,1] & x1>=xyz$x2[,2]) On the last point you mentioned, I agree on the removing "extreme values" is a serious distortion of the data. But in my data values to some observations is set to very high number like say . Also this is not consistent across all variables in the data. So I can set value higher than 1.5 in the function and get cut-offs for each varibales and remove such obervations. As rm.outlier removes only one value, I am using above function. Thanks for the help in advance. Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Data-frame-manipulation-by-eliminating-rows-containing-extreme-values-tp3927941p3929927.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to get Quartiles when data contains both numeric variables and factors
When data contains both factor and numeric variables, how to get quartiles for all numeric variables? n <- 100 x1 <- runif(n) x2 <- runif(n) x3 <- x1 + x2 + runif(n)/10 x4 <- x1 + x2 + x3 + runif(n)/10 x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) x6 <- factor(1*(x5=='a' | x5=='c')) data1 <- cbind(x1,x2,x3,x4,x5,x6) data <- data.frame(data1) data <- within(data,{x5 <- factor(x5)}) x <- data qs <- sapply(x, function(x) quantile(x, c(0.01, 0.99))) I get an error: Error in quantile.default(x, c(min_pct, max_pct)) : factors are not allowed Thanks for the help. -- View this message in context: http://r.789695.n4.nabble.com/How-to-get-Quartiles-when-data-contains-both-numeric-variables-and-factors-tp3955750p3955750.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating deciles on data using one variable
I need to deciles data containing more than one variables using any one variable. I am using script below : id <-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) tot <-c(1230, 1230, 2345, 3456, 456, 4356, 123, 124, 987, 785, 5646, 345, 2345, 3456, 456, 4356, 123, 124, 987, 785) data <- data.frame ( cbind(id , tot)) data$decile<-cut(data$tot,quantile(data$tot,(0:10)/10),include.lowest=TRUE,lable=TRUE) data$decile New variable "decile" taking values as below where as I need it should take values from 1,2..10, Where I am going wrong? data$decile [1] (987,1.23e+03] (987,1.23e+03] (1.23e+03,2.34e+03] [4] (2.34e+03,3.46e+03] (301,456] (3.46e+03,4.36e+03] [7] [123,124] (124,301] (785,987] [10] (456,785] (4.36e+03,5.65e+03] (301,456] [13] (1.23e+03,2.34e+03] (2.34e+03,3.46e+03] (301,456] [16] (3.46e+03,4.36e+03] [123,124] (124,301] [19] (785,987] (456,785] -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Creating-deciles-on-data-using-one-variable-tp3973086p3973086.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Decision tree model using rpart ( classification
Hi Experts, I am new to R, using decision tree model for getting segmentation rules. A) Using behavioural data (attributes defining customer behaviour, ( example balances, number of accounts etc.) 1. Clustering: Cluster behavioural data to suitable number of clusters 2. Decision Tree: Using rpart classification tree for generating rules for segmentation using cluster number(cluster id) as target variable and variables from behavioural data as input variables. B) Using profile data (customers demographic data ) 1. Clustering: Cluster profile data to suitable number of clusters 2. Decision Tree: Using rpart classification tree for generating rules for segmentation using cluster number(cluster id) as target variable and variables from profile data as input variables. C) Using profile data (customers demographic data ) and deciles created based on behaviour 1. Deciles: Deciles customers to 10 groups based on some behavioural data 2. Decision Tree: Using rpart classification for generating rules for segmentation using Deciles as target variable and variables from profile data as input variables. In first two cases A and B decision tree model using rpart finish the execution in a minute or two, But in third case (C) it continues to run for infinite amount of time( monitored and running even after 14 hours). fit <- rpart(decile ~., method="class",data=dtm_ip) Is there anything wrong with my approach? Thanks for the help in advance. -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Decision-tree-model-using-rpart-classification-tp3989162p3989162.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decision tree model using rpart ( classification
Hi, Thanks for the responce, code for each case is as: c_c_factor <- 0.001 min_obs_split <- 80 A) fit <- rpart(segment ~., method="class", control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), data=Beh_cluster_out) B) fit <- rpart(segment ~., method="class", control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), data=profile_cluster_out) C) fit <- rpart(decile ~., method="class", control=rpart.control(minsplit=min_obs_split, cp=c_c_factor), data=dtm_ip) In A and B target variable 'segment' is from the clustering data using same set of input variables , while in C target variable 'decile' is derived from behavioural variables and input variables are from profile data. Number of rows in the input table in all three cases are same. Regards, -Ajit -- View this message in context: http://r.789695.n4.nabble.com/Decision-tree-model-using-rpart-classification-tp3989162p3989320.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Assign value to new variable based on conditions on other variables
Hi Experts, This may be simple question, I want to create new variable "seg" and assign values to it based on some conditions satisfied by each observation. Here is the example: ##Below are the conditions ##if variable x2 gt 0 and x3 gt 200 then seg should take value 1, ##if variable x2 gt 100 and x3 gt 300 then seg should take value 2 ##if variable x2 gt 200 and x3 gt 400 then seg should take value 3 ##if variable x2 gt 300 and x3 gt 500 then seg should take value 4 id <- c(1,2,3,4,5) x2 <- c(200,100,400,500,600) x3 <- c(300,400,500,600,700) dd <- data.frame(id,x2,x3) dd$seg[dd$x2> 0 && dd$x3> 200] <-1 dd$seg[dd$x2> 100 && dd$x3> 300] <-2 dd$seg[dd$x2> 200 && dd$x3> 400] <-3 dd$seg[dd$x2> 300 && dd$x3> 500] <-4 I tried as above but it is not working for me. What is the correct and efficient way to do this. Thanks for the help in advance!! -- View this message in context: http://r.789695.n4.nabble.com/Assign-value-to-new-variable-based-on-conditions-on-other-variables-tp4544753p4544753.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assign value to new variable based on conditions on other variables
I have got solution using within function as below dd$Seg <- 1 dd <- within(dd, Seg[x2> 0 & x3> 200] <- 1) dd <- within(dd, Seg[x2> 100 & x3> 300] <- 2) dd <- within(dd, Seg[x2> 200 & x3> 400] <- 3) dd <- within(dd, Seg[x2> 300 & x3> 500] <- 4) I sthere any better way of doing it!! -- View this message in context: http://r.789695.n4.nabble.com/Assign-value-to-new-variable-based-on-conditions-on-other-variables-tp4544753p4544795.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Java heap space Error while reading table from postgres database using RJDBC
Hi List, I am reading table from postgres database into R session using RJDBC, table contains 150 columns and 20 rows. Sample code is as below, which works fine with smaller tables. db_driver <- mydir$db_driver db_jar_file <- mydir$db_jar_file db_server <- mydir$db_server db_server_lgn <- mydir$db_server_lgn db_server_pwd <- mydir$db_server_pwd library(RJDBC) drv <- JDBC(paste(db_driver, sep = ""), paste(db_jar_file, sep = ""), identifier.quote="`") conn <- dbConnect(drv, paste(db_server, sep = ""), paste(db_server_lgn, sep = ""), paste(db_server_pwd, sep = "")) cs_input_abt <- dbReadTable(conn, "cs_input_abt") Following are the different error occurs after executing above script, every time different error when above script is executed. 1. Error in .jcall(rp, "I", "fetch", stride) : java.lang.OutOfMemoryError: Java heap space 2. Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve JDBC result set for SELECT * FROM cs_input_abt (Could not initialize class org.postgresql.util.PSQLException) 3. Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve JDBC result set for SELECT * FROM bs_modelling_abt (GC overhead limit exceeded) Where am I going wrong? Is there any option which I had not used in the RJDBC connection or needed to add? [[elided Yahoo spam]] Ajit -- View this message in context: http://r.789695.n4.nabble.com/Java-heap-space-Error-while-reading-table-from-postgres-database-using-RJDBC-tp4372816p4372816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing date as parameter while retrieving data from database using dbGetQuery
Hi All, This might be simple question, I need to retrive data for modelling from the databases. Eveytime date values changes so I countnot fix date value in the code, it is required to pass as parameter. When I pass the date as parameter, it throws error. (ERROR: column "start_dt" does not exist Position: 285) My script is as below, please guide me where am I going wrong? All parameters are passed correctly, when start_dt and end_dt are replaced by '2010-11-01' and '2011-01-31' respectively in the query code works fine without any errors. # db_driver <- mydir$db_driver db_jar_file <- mydir$db_jar_file db_server <- mydir$db_server db_server_lgn <- mydir$db_server_lgn db_server_pwd <- mydir$db_server_pwd library(RJDBC) .jinit(classpath="myClasses.jar", parameters="-Xmx4096m") drv <- JDBC(paste(db_driver, sep = ""), paste(db_jar_file, sep = ""), identifier.quote="`") conn <- dbConnect(drv, paste(db_server, sep = ""), paste(db_server_lgn, sep = ""), paste(db_server_pwd, sep = "")) start_dt <- as.Date('2010-11-01',format="%Y-%m-%d") end_dt <- as.Date('2011-01-31',format="%Y-%m-%d") library(sqldf) target_population <- dbGetQuery(conn, "select distinct a.primary_customer_code as cust_id, a.primary_product_code, a.account_opening_date, b.l4_product_hierarchy_code, b.l5_product_hierarchy_code from account_dim a, product_dim b where a.primary_product_code=b.l5_product_hierarchy_code and a.account_opening_date between start_dt and end_dt") As it is not possible to reproduce error with the above code, I am providing sample example as below with sqldf function using dataframe. date_tm <- as.Date(c('2010-11-01', '2011-11-01','2010-12-01', '2011-01-01', '2011-02-01')) x1 <- c(1,2,3,4,5) x2 <- c(100,200,300,400,500) test_data <- data.frame(x1,x2,date_tm) test_data start_dt <- as.Date('2011-01-01',format="%Y-%m-%d") #Passing as parameter end_dt <- as.Date('2011-02-31',format="%Y-%m-%d") #Passing as parameter library(sqldf) new_data <- sqldf("select * from test_data where date_tm = start_dt") It shows similar error, when date is passed by parameter start_dt (error in statement: no such column: start_dt) [[elided Yahoo spam]] ~Ajit -- View this message in context: http://r.789695.n4.nabble.com/Passing-date-as-parameter-while-retrieving-data-from-database-using-dbGetQuery-tp4390216p4390216.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issues while using “lift.chart” and “adjProbScore” function from ”BCA” library
Dear List, Couple of issues while using functions from “BCA” library: 1. I am trying to use “lift.chart” function from “BCA” library, but facing issues while using model where model formula is passed as formula object in glm. When model formula is written as text, then it works fine. In my case input variables and target variables are going to change dynamically, so have to used formula as formula object as derived. Below is the sample code, taken from the package document to illustrate the issues library(BCA) data(CCS) CCS$Sample <- create.samples(CCS, est=0.4, val=0.4) CCSEst <- CCS[CCS$Sample == "Estimation",] #Fit glm model with formula written as text CCS.glm <- glm(MonthGive ~ DonPerYear + LastDonAmt + Region + YearsGive, family=binomial(logit), data=CCSEst) CCSVal <- CCS[CCS$Sample == "Validation",] lift.chart(c("CCS.glm"), data=CCSVal, targLevel="Yes", trueResp=0.01, type="incremental", sub="Validation") #Fit glm model with formula passed as formula object fm <- as.formula("MonthGive ~ DonPerYear + LastDonAmt + Region + YearsGive") CCS.glm12 <- glm(fm,family=binomial(logit), data=CCSEst) lift.chart(c("CCS.glm12"), data=CCSVal, targLevel="Yes", trueResp=0.01, type="incremental", sub="Validation") Following error occurs, Error in if (any(yvar1 != yvar1[1])) { : missing value where TRUE/FALSE needed Is there any way out to use formula object in the model and using “lift.chart” function 2. Issue using “adjProbScore” function from the “BCA” library. (adjProbScore(model="CCS.glm", data=CCSVal1, targLevel="Yes", trueResp=0.01)) Error in parse(text = paste("as.character(", ActiveDataSet(), "$", yvar, : :1:16: unexpected '$' 1: as.character( $ ^ Above error is thrown, am I doing anything wrong? Please correct. Also, as in the case-1 above, can we use model fitted with formula object in “adjProbScore” function. Thanks in advance! Ajit -- View this message in context: http://r.789695.n4.nabble.com/Issues-while-using-lift-chart-and-adjProbScore-function-from-BCA-library-tp4631158.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.